Patent attributes
A mechanism is provided in a first client for approximate linkage of datasets over quasi-identifiers. The mechanism receives a generalization logic data structure representing sets of values for each quasi-identifier in a first dataset of the first client. For each record in the first dataset, the mechanism generates at least one generalization of a value of a given quasi-identifier in the first dataset based on a selected generalization logic data structure corresponding to the given quasi-identifier and generates a generalized record for each of the at least one generalization to form a first generalized dataset. The mechanism sends the first generalized dataset to a semi-trusted third party for approximate linkage of the first dataset with a second dataset of a second client, receives an approximate join result from the semi-trusted third party, performs post-processing on the approximate join result, and determines a final linkage result based on the post-processing.