Patent attributes
Some embodiments of the present invention include a method for determining a dense subset from a group of records using a graphical representation of the group of records, the graphical representation having nodes and edges, a node associated with a record from the group of records, an edge connecting two nodes associated with two related records, wherein a node is associated with a weight corresponding to a number of edges connected to the node, wherein a record is added to the dense subset based on its associated node having a highest weight and a density that satisfies a density threshold, the density being based on the content of the dense subset, and wherein the content of the dense subset is to be processed as including duplicate records.