Is a
Patent attributes
Patent Jurisdiction
Patent Number
Date of Patent
October 25, 2016
Patent Application Number
13350951
Date Filed
January 16, 2012
Patent Citations Received
Patent Primary Examiner
Patent abstract
Technologies are described herein for classifying structured documents based on the structure of the document. A structured document is received, and the structural elements are parsed from the document to generate a text string representing the structure of the document instead of the semantic textual content of the document. The text string may be broken into N-grams utilizing a sliding window, and a classifier trained from similar structured documents labeled as belonging to one of a number of document classes is utilized to determine a probability that the document belongs to each of the document classes based on the N-grams.
Timeline
No Timeline data yet.
Further Resources
No Further Resources data yet.