Patent 10482079 was granted and assigned to CoreLogic on November, 2019 by the United States Patent and Trademark Office.
A system, method, and computer program includes a communications interface configured to receive a set of industry reports from multiple industry sources, and circuitry to compare one or more attributes of at least two trade lines to identify whether the at least two trade lines are duplicates. The circuitry characterizes as a binary indication whether the comparing indicates the one or more attributes are a match, and display a representation of the binary indication and receive a user-identified indication whether the at least two trade lines are duplicates. The circuitry trains a classifier, records the indication whether the at least two trade lines are duplicates and removes at least one of the at least two trade lines from the set of industry reports, and runs the classifier. Subsequently, a supervised machine learning classifier is trained in fit on the training data and is evaluated for accuracy of the testing data.