Patent attributes
A method, system, and computer program product for detecting malicious code insertion in data are provided in the illustrative embodiments. At an application executing using a processor and a memory in a data processing system, a script that has been inserted in a mix of code and content is detected. A content-related portion is removed from the script to form a remaining script structure, the content-related portion referring to the content in the mix. From the remaining script structure, a code construct is selected and replaced with an alphanumeric string to form a normalized construct. Whether the normalized construct matches, within a tolerance, a second normalized construct in a corpus of normalized scripts is determined. Responsive to the normalized construct matching the second normalized construct within the tolerance, a conclusion is drawn that the script is malicious.