Adversarial machine learning

Adversarial machine learning is a branch of machine learning research focused on the development of secure and robust models through a process of attempting to deceive models using malicious or false inputs.

Neural networks execute tasks such as clustering, classification, association and prediction.An artificial neural network is a computational model which is developed based on iterative exposure to large sets of training data which affects the statistical weights and balances of the model.

Adversarial training entails intentionally incorporating statistical noise into the training data with the initial intent to deceive the model, thus identifying vulnerabilities and ways to improve model robustness and resilience. In the context of machine learning, robustness refers to reliable operation of a system across a range of conditions (including attacks) and resilience refers to adaptable operations and recovery from disruptions (including attacks).

For developers and maintainers of machine learning models, the ultimate goal of incorporating adversarial methods is to train a model to accommodate and process inputs which may be malicious or otherwise differ from a narrow set of expected inputs. For malicious actors, the goal is to identify a vulnerability in the system which allows them to destroy, invalidate, or subvert a machine learning model.

Taxonomy of attacks, defenses, and consequences

In October 2019, the National Institute of Standards and Technology (NIST) released and draft taxonomy and terminology guide for adversarial machine learning.

Taxonomy of Attacks, Defenses, and Consequences in Adversarial Machine Learning
Adversarial examples

Adversarial examples are intentionally manipulated data which are fed into a neural network with the intent of deceiving it. An adversarial example is generated by introducing a small perturbation to a sample of known-good training data, such that the newly-generated adversarial example reliably causes undesired behaviors or outputs (ex. consistently mis-classifying images) from a machine learning model.

To simulate real-world malicious behavior against a neural network, adversarial examples often appear indistinguishable from legitimate samples from the training data. Adversarial examples of image or audio data, for example, may look or sound nearly identical to legitimate samples to avoid detection by human observers of the input stream.

An example of adversarial example generation applied to GoogLeNet.

Adversarial examples of image data can also be generated by printing images on paper, and then taking a photo of the resulting image printed onto a piece of paper.In addition to these real-world methods, there are open source software tools which can be used to generate adversarial examples.




Ian Goodfellow

Credited with inventing Generative Adversarial Networks (GAN)

Further reading


A taxonomy and terminology of adversarial machine learning

Elham Tabassi, Kevin J. Burns, Michael Hadjimichael, Andres D. Molina-Markham, Julian T. Sexton

October 30, 2019

Adversarial Machine Learning -- Industry Perspectives

Ram Shankar Siva Kumar, Magnus Nyström, John Lambert, Andrew Marshall, Mario Goertzel, Andi Comissoneru, Matt Swann, Sharon Xia


February 4, 2020

Adversarial Machine Learning at Scale - Google Research

Alexey Kurakin. Ian J. Goodfellow. Samy Bengio,



Attacking Machine Learning with Adversarial Examples

Ian Goodfellow


February 24, 2017

Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy

December 20, 2014

Introduction to Adversarial Machine Learning

Arunava Chakraborty


October 16, 2019

Is Supervised Learning With Adversarial Features Provably Better Than Sole Supervision?

Litu Rout


October 30, 2019

Documentaries, videos and podcasts


'How neural networks learn' - Part II: Adversarial Examples

January 11, 2018

Adversarial Attacks on Neural Networks - Bug or Feature?

September 10, 2019

Adversarial Machine Learning

November 20, 2019

Generative Adversarial Networks (GANs) - Computerphile

October 25, 2017

Lecture 16 | Adversarial Examples and Adversarial Training

August 11, 2017




Golden logo
Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.