US Patent 11450126 Systems and methods for automatically extracting canonical data from electronic documents

Described herein is a computer-implemented method for automatic extraction of canonical data from an electronic document. The method comprises classifying a first text rectangle in an electronic document as a label and a second text rectangle as a value using a first machine learning algorithm. A first probability score of a likelihood of the first text rectangle corresponding to a first canonical category is determined using a second machine learning algorithm. A second probability score of a likelihood of the second text rectangle corresponding to a first canonical category is determined using a third machine learning algorithm. A relative spatial position of the second text rectangle relative to the first text rectangle is calculated. Based on the relative spatial position, the first probability score, and the second probability score, the first text rectangle, and the second text rectangle are classified into the first canonical category.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

No Further Resources data yet.

US Patent 11450126 Systems and methods for automatically extracting canonical data from electronic documents

Contents

Patent attributes

Timeline

Further Resources

References

Find more entities like US Patent 11450126 Systems and methods for automatically extracting canonical data from electronic documents