Patent attributes
A system and method for an entropy-based near-match analysis identifies target files that are almost, but not identical, to a reference file. A computing processor computes entropies of the reference and target files, and determines the likeness of the target files to the references file based on the computed entropies. The computing processor determines a near match between the target file and the reference file if the likeness of the two files is within a user-defined tolerance level. According to one embodiment of the invention, the information entropy is a weighted value that takes into account the size of the file.