Patent attributes
An embodiment for extracting information from semi-structured text is provided. The embodiment may include identifying one or more high confidence alignments of one or more entities and identifiers in a set of documents. The embodiment may also include analyzing one or more blocks of semi-structured text containing the one or more entities and identifiers. The embodiment may further include identifying one or more known alignments in each of the one or more blocks of semi-structured text. The embodiment may also include generating a structure template. The embodiment may further include applying the structure template to each of the one or more blocks of semi-structured text. The embodiment may also include annotating the set of documents with metadata reflecting the structure template and a location of each of the one or more blocks of semi-structured text.