A system for analyzing similarities between a first and second corpus or between a set of concepts and a corpus uses natural language processing and machine intelligence methods to replace terms or phrases in the corpus with concepts, determine the frequency of each concept in the corpus, and convert the corpus into a concept frequency file to enable easy comparison of the two corpuses or easy retrieval of items from the corpus that contain concept. Difference analysis and a combination of content and spectral analysis may be employed.