Patent attributes
In some examples, a document can be received and parsed to identify sentences of the document. A plurality of domain natural language classifiers (NLCs) trained based on domain training data associated with a respective domain of a plurality of domains can be programmed to classify each identified sentence to determine a sentence confidence score for each identified sentence. A plurality of document confidence scores for the document can be determined based on sentence confidence scores determined by the plurality of domain NLCs. Each document confidence score can characterize a relevance of the document to the respective domain of the plurality of domains. Document similarity data identifying at least one document among documents associated with a corresponding domain of the plurality of domains can be generated based on an evaluation of the document confidence scores for the given document and document confidence scores for the documents associated with the corresponding domain.