Patent 8140337 was granted and assigned to NEC Corporation on March, 2012 by the United States Patent and Trademark Office.
Disclosed is an apparatus includes a text input device that inputs text data provided with confidence measure, as subject for mining, a language processing unit that performs language analysis of the input text data provided with the confidence measures, a confidence measure exploiting characteristic word count unit that counts the characteristic words in the input text to provide a count result and that exploits the statistical information and the confidence measures provided in the input text to correct the count result obtained, a characteristic measure calculation unit that calculates the characteristic measure of each characteristic word from the corrected count result, a mining result output device that outputs the characteristic measure of each characteristic word obtained, a user operation input device for a user to input setting for language processing of the input text and setting for a technique for calculating the characteristic measure being found, a mining process management unit that transmits a user's command delivered from the user operation input device to respective components, and a statistical information database that records and holds the statistical information representing the property of the input text that may be presupposed.