Log in
Enquire now
‌

US Patent 10558627 Method and system for cleansing and de-duplicating data

Patent 10558627 was granted and assigned to Leantaas on February, 2020 by the United States Patent and Trademark Office.

OverviewStructured DataIssuesContributors

Contents

Is a
Patent
Patent

Patent attributes

Patent Applicant
Leantaas
Leantaas
Current Assignee
Leantaas
Leantaas
Patent Jurisdiction
United States Patent and Trademark Office
United States Patent and Trademark Office
Patent Number
10558627
Patent Inventor Names
Hugh Cassidy0
Jayant Lakshmikanthan0
Sofia DeMarco0
Date of Patent
February 11, 2020
Patent Application Number
15488388
Date Filed
April 14, 2017
Patent Citations Received
‌
US Patent 11409772 Active learning for data matching
‌
US Patent 12013840 Dynamic discovery and correction of data quality issues
0
‌
US Patent 11972228 Merging database tables by classifying comparison signatures
0
‌
US Patent 11663275 Method for dynamic data blocking in a database system
0
Patent Primary Examiner
‌
James E Richardson
Patent abstract

Method and system for cleansing and de-duplicating data in database are provided. The method includes filtering garbage records from a plurality of records based on data fields, and applying cleansing rules to create a cleansed database. A similarity vector is generated, where each vector corresponds to pairwise comparison of distinct data entries in cleansed database. Matching rules are applied to label each vector as one of matched, unmatched and unclassified. The method analyzes the vectors labeled as matched and unmatched to train a machine learning model to identify duplicates in the cleansed database. Unclassified vectors in the cleansed database are labeled as matched or unmatched by applying machine learning model on unclassified vectors. Thereafter, the method processes all the vectors labeled as matched to create clusters of records that are duplicates of each other. Further, records in each cluster are merged to obtain de-duplicated cleansed database using predefined consolidated rules.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date
No Further Resources data yet.

References

Find more entities like US Patent 10558627 Method and system for cleansing and de-duplicating data

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.