Patent attributes
Disclosed herein are systems and methods for classifying unstructured datasets according to a classification system and generating an enhanced, classified and structured data-set enabling efficient supplemental computer-based processing. The exemplary computer-implemented classification algorithms involve, for each entry in the input dataset, semantically interpreting a text-based occupation description, analyzing the description according to an ontology of interrelated “concepts” and identifying semantically relevant concept(s) and any associated descriptors specific to the classification system. The system is also configured to expand the list of relevant concepts to include concepts that bear a relationship thereto, scoring the various concepts and associated descriptors and identifying the concept(s) and descriptors that most accurately correspond to the input data. Further, the system is configured to generate the new structured and classified occupation dataset by selectively combining certain input data and augmenting each entry with supplemental information inferred through the classification process.