Systems and methods include obtaining a file to be checked for Data Loss Prevention (DLP); determining a cryptographic hash of the file and comparing the cryptographic hash to corresponding cryptographic hashes of indexed files; responsive to a match between the cryptographic hash and one of the corresponding cryptographic hashes, determining a DLP match and performing an action based thereon; responsive to no match, extracting text from the file and creating an ordered sequence of hashes of variable length chunks of the extracted text; and determining the DLP match with one of the indexed files based on comparing the ordered sequence of hashes with corresponding ordered sequence of hashes of the indexed files.