Patent attributes
A system and method for statistical fingerprinting of structured datasets begins by dividing the structured database into groups of data subsets. These subsets are created based on the structure of the data; for example, data delineated by columns and rows may be broken into subsets by designating each column as a subset. A fingerprint is derived from each subset, and then the fingerprint for each subset is combined in order to create an overall fingerprint for the dataset. By applying this process to a “wild file” of unknown provenance, and comparing the result to a data owner's files, it may be determined if data in the wild file was wrongfully acquired from the data owner.