Patent attributes
Aspects of the present disclosure relate to identifying a product in a document. A server accesses a document including scientific or research-related text. The server divides the document into a plurality of tokens, each token comprising a part of the text that logically comprises a unit of information. The server computes, for each token in the plurality of tokens, a score corresponding to whether the token corresponds to a commercial product, the score being computed based on a list of features of commercial products and weights assigned to features in the list. The server determines that the score exceeds a threshold score. The server provides, in response to determining that the score exceeds the threshold score, an output representing that the token corresponds to the commercial product.