Patent attributes
The disclosure provides an efficient dataset search and/or deduplication that improve the speed and efficiency of dataset record search and/or deduplication over traditional methods. Certain implementations apply field-level deletion neighborhood processing to ordered field permutations of dataset records encoded with hash values. A method includes determining a field-level deletion neighborhood for two or more field combinations of the record by determining field hash values, creating field permutations, determining combined record hash values for each permutation; and associating each record hash value to the unique entity identifier. The method includes searching other entity representation records for matching combined record hash values, and assigning one or more of a unique entity identifier and a duplicate entity identifier to the other entity representation records having the matching combined record hash values. Certain implementations can include removing, from the database, at least one of the other entity representation records having a duplicate record identifier.