Patent attributes
Disclosed are techniques for trimming large clusters of related records. In one embodiment, a method is disclosed comprising receiving a set of clusters, each cluster in the clusters including a plurality of records. The method extracts an oversized cluster in the set of clusters and performs a breadth-first search (BFS) on the oversized cluster to generate a list of visited records. The method terminates the BFS upon determining that the size of the list of visited records exceeds a maximum size and generates a new cluster from the list of visited records and adding the new cluster to the set of clusters. By recursively performing BFS traverse over the oversized cluster and extracting smaller new clusters from it, the oversized cluster is eventually partitioned into a set of sub-clusters with the size smaller than the predefined threshold.