Patent attributes
In one aspect, the present invention provides a for estimating the similarity between at least two portions of text including the steps of forming a set of syntactic tuples, each tuple including at least two terms and a relation betweeen the two terms; classifying the relation between the terms in the tuples according to a predefined set of relations; establishing the relative agreement between syntactic tuples from the portions of text under comparison according to predefined classes of agreement; calculating a value representative of the similarity between the portions of text of each of the classes of agreement; and establishing a value for the similarity between the portions of text by calculating a weighted sum of the values representative of the similarity between the portions of text for each of the classes of agreement. Preferaly, the step of calculating a value representative of the similarity between the portions of text for each of the classes of agreement includes a weighting based upon the number of matched terms occurring in particular parts of speech in which the text occurs. It is also preferred that the step of calculating a value representative of the similarity between the portions of text for each of the classes of agreement include the application of a weighting factor to the estimate of similarity for each of the classes of agreement and the parts of speech in which matched terms occur.