Patent attributes
A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is provided. The method includes identifying a set of reference character strings in the database wherein the reference character strings are identified utilizing an optimization search for a set of dissimilar character strings and generating an n-gram representation for one of the reference character strings in the set of reference character strings. The method also includes generating an n-gram representation for the candidate character string determining a similarity between the n-gram representations, and indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.