Patent attributes
Text clustering includes: identifying, for a set of non-stop words in a text, a corresponding set of related topic clusters relating to the set of non-stop words, the identification being based at least in part on a plurality of topic clusters each comprising a corresponding plurality of topically related words and a corresponding cluster identifier; for non-stop words in the set of non-stop words that are identified to have corresponding related topic clusters, replacing the non-stop words with corresponding cluster identifiers of the corresponding related topic clusters to generate a clustered version of the text; and providing the clustered version of the text to be further analyzed.