Patent attributes
The present disclosure relates to systems and methods for generating synthetic documents. In one implementation, a system for generating synthetic data from a plurality of documents may include at least one processor and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the system to: receive a plurality of documents, individual documents of the plurality of documents having a same document type; generate a distribution of values for a corresponding pixel in the individual documents of plurality of documents; determine, based on the distributions, one or more common features of the plurality of documents; determine, based on the comparison, one or more input fields; generate a template including the one or more common features and the one or more input fields; and input synthetic data into the one or more input fields of the template thereby generating a plurality of synthetic documents.