System and method for training a human perception predictor to determine level of perceived similarity between data samples, the method including: receiving at least one media file, determining at least one identification region for each media file, applying at least one transformation on each identification region for each media file until at least one modified media file is created, receiving input regarding similarity between each modified media file and the corresponding received media file, and training a machine learning model with an objective function configured to predict similarity between media files by a human observer in accordance with the received input.