Patent attributes
Automatic locale determination for documents is described. In an embodiment, a computer server receives an electronic document comprising a plurality of unknown-language data elements each associated with one or more types. Based on a document schema of the document, the computer system selects one or more unknown-language data elements from the plurality of unknown-language data elements and assigning to each of the one or more unknown-language data elements a corresponding weight value based on a respective type of the unknown-language data element. The computer system compares the one or more unknown-language data elements with a plurality of known-language data elements that are associated with the document schema and based on the comparing, determines a number of unknown-language data elements in the one or more unknown-language data elements that matched any in a subset of the plurality of known-language data elements, wherein the subset of known-language data elements corresponds to a particular language. Based on the number of data elements that matched to the subset of known-language data elements and based on the corresponding weight assigned to each unknown-language data element in the number of unknown-language data elements, the computer system determines a language confidence level value specifying a level of machine confidence that the document is expressed in the particular language and based on the language confidence value for the particular language exceeding a language threshold value, automatically processes the document using the particular language.