Machine learning is a branch of artificial intelligence (AI) and computer science that deals with the design of programs capable of learning rules from data, adapting to changes, and improving performance with experience. Most deployed AI programs use machine learning, leading to the two phrases (machine learning and AI) being used interchangeably and sometimes ambiguously. The phrase was coined by AI pioneer Arthur Samuel in 1952, who defined machine learning as
The field of study that gives computers the ability to learn without explicitly being programmed
Machine learning can also be broadly defined as the capability of a machine to imitate human behavior, using AI systems to perform complex tasks in a manner similar to how humans solve problems. Real-world examples of machine learning include speech recognition, computer vision, and recommendation engines.
Traditional methods of programming computers require inputting detailed instructions for the machine to follow. This approach can be time-consuming or even impossible for certain use cases. Machine learning takes the approach of writing algorithms that let the computer program themselves through experience. An important field within data science, machine learning algorithms use statistical methods to make classifications or predictions, as well as uncover new insights in data mining projects. Machine learning algorithms are regularly used during decision-making processes within applications and businesses, sorting through data and optimizing business operations, including marketing, sales, logistics, and more.
Machine learning models require datasets, such that the model can train itself to understand the different relationships and parameters governing the data it is fed. The algorithms are designed to optimize for a given parameter, learning to adapt and improve their performance given more data. While the machine "learns" by itself, developers can tweak the model to push it toward more accurate results. Evaluating the performance of a machine learning algorithm requires additional data (separate from the training data) to test how accurately the machine learning model responds to new information. Evaluation methods depend on the goal of the algorithm. Machine learning systems can be:
- Descriptive—uses data to explain what happened
- Predictive—uses data to predict what will happen
- Prescriptive—uses data to make suggestions about future actions to take
Machine learning theory, also referred to as Computational Learning Theory, aims to understand the principles of learning as a computational process. It aims to understand the capabilities and information fundamentally needed to learn different kinds of tasks successfully and the principles involved in getting computers to learn from data and improve with experience. Machine learning theory draws elements from both the theory of computation and statistics, involving tasks such as:
- Building mathematical models to capture key aspects of machine learning, in which one can analyze the inherent ease or difficulty of different types of learning problems.
- Proving guarantees for algorithms (under what conditions will they succeed, how much data and computation time is needed) and developing algorithms that provably meet desired criteria.
The basic concepts of machine learning involve using statistical learning and optimization methods so computers can analyze data and identify patterns. Machine learning techniques leverage data mining to identify historic trends and inform future models. A typical supervised machine learning algorithm consists of roughly three components:
- A decision process—A series of calculations or other steps based on the input data (labeled or unlabeled) that the algorithm uses to produce an estimate about a pattern in the data.
- An error function—A method of evaluating the pattern output from step one. If there are known examples, an error function can make a comparison to assess the accuracy of the decision process.
- An updating or optimization process—Adapting the estimate to better fit data points in the training set, adjusting weights to reduce the discrepancy between the known example and the model estimate. Machine learning algorithms repeat this process, adjusting weights autonomously until a defined level of accuracy is met.
Supervised learning is a type of machine learning in which data is fully labeled and algorithms learn to approximate a mapping function well enough that they can accurately predict output variables given new input data. Supervised learning uses classification and regression techniques to develop machine learning models. Classification techniques predict discrete responses by classifying input data into categories. Typical applications include medical imaging, speech recognition, and credit scoring. Regression techniques predict continuous responses. Typical applications include virtual sensing, electricity load forecasting, and algorithmic trading.
In unsupervised machine learning, a program looks for patterns in unlabeled data. Unsupervised machine learning can uncover new patterns or trends in data that people aren’t explicitly looking for. Clustering is the most common unsupervised learning technique. It finds hidden patterns or groupings in data using exploratory data analysis. Applications for cluster analysis include gene sequence analysis, market research, and object recognition.
Semi-supervised learning offers a middle ground between supervised and unsupervised learning. During training, semi-supervised algorithms use a smaller labeled dataset to guide classification and feature extraction from a larger, unlabeled data set. Semi-supervised learning is beneficial when limited labeled data is available.
Reinforcement machine learning trains algorithms through trial and error to take the best action by establishing a reward system. Reinforcement learning can train models to play games or train autonomous vehicles to drive by telling the machine when it made the right decisions, helping it learn over time and determine the optimal actions to take.
Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence. But, neural networks are actually a sub-field of machine learning, and deep learning is a sub-field of neural networks.
Neural networks are a class of machine learning algorithms modeled on the brain in which thousands or millions of nodes are interconnected and organized into layers. In an artificial neural network, nodes are connected with each processing input and produce an output. Each node connects to another and has an associated weight and threshold. If the output of a node is above the threshold, it is activated, sending data to the next layer in the network. Below the threshold, no data is passed to the next layer from that node.
Deep learning algorithms refer to neural networks consisting of more than three layers (inclusive of the input and output). A three-layer neural network is just a basic neural network. Deep learning and machine learning differ in how the algorithm "learns." Deep learning can use both labeled (supervised learning) and unlabeled data to train. Deep learning ingests unstructured data in its raw form, automatically determining the internal features and distinguishing different categories of data. This approach eliminates some of the human intervention required and enables the use of larger data sets.
A number of machine learning algorithms are commonly used by modern technology companies. Each of these machine learning algorithms can be applied to numerous applications in a variety of educational and business settings.
A decision tree is a model for supervised learning that can construct a non-linear decision boundary over the feature space. Decision trees are represented as hierarchical models of "decisions" over the feature space, making them powerful models that are also easily interpretable.
In unsupervised machine learning, clustering is the process of grouping similar entities together in order to find similarities in the data points and group similar data points together.
Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model. The purpose is to decrease variance (bagging), bias (boosting), or improve predictions (stacking).
Machine learning classification has problems when there are too many factors or variables, also called features. When most of the features are correlated or redundant, dimensionality reduction algorithms are used to reduce the number of random variables. Certain features are selected, and others are extracted.
Machine learning models are parameterized to tune their behavior for a given problem. Noise contrastive estimation (NCE) is an estimation principle for parameterized statistical models. NCE is a way of learning a data distribution by comparing it against a defined noise distribution. The technique is used to cast an unsupervised problem as a supervised logistic regression problem. NCE is often used to train neural language models in place of Maximum Likelihood Estimation.
Ideas used in machine learning were first conceived from the mathematical modeling of neural networks. A paper by logician Walter Pitts and neuroscientist Warren McCulloch, published in 1943, attempted to mathematically map out thought processes and decision-making in human cognition. Machine learning is partially based on a model of brain-cell interaction created in 1949 by Donald Hebb in his book The Organization of Behavior. The book presents Hebb’s theories on neuron excitement and communication between neurons. Hebb writes:
When one cell repeatedly assists in firing another, the axon of the first cell develops synaptic knobs (or enlarges them if they already exist) in contact with the soma of the second cell.
Hebb's model can be translated to artificial neurons (or nodes). The relationship between two neurons/nodes strengthens if they are activated at the same time and weakens if they are activated separately. The word “weight” is used to describe these relationships. Nodes/neurons tending to be both positive or both negative are described as having strong positive weights. Those nodes with opposite weights develop strong negative weights.
In 1950 Alan Turing proposed the Turing Test, a measure for machine intelligence. The test's criteria state that an "intelligent" machine is one that can convince a human being that it is also a human being. Arthur Samuel began developing a computer program for playing the game checkers in the 1950s while working at IBM. With only a small amount of memory available, Samual developed what he called alpha-beta pruning. His program included a scoring function based on the position of the pieces on the board, that attempted to measure the chances of each side winning. The program chooses moves using a minimax strategy and would later evolve into the minimax algorithm. Samuel designed a number of mechanisms to improve his program, including "rote learning" recording positions it had previously seen combined with values of the reward function. It was Arthur Samuel who coined the phrase "machine learning" in 1952. The phrase was popularized by a paper Samuel published in 1959 titled "Some studies in machine learning using the game of checkers."
In 1957, Frank Rosenblatt (while working at the Cornell Aeronautical Laboratory) combined Donald Hebb’s model of brain cell interaction with Arthur Samuel’s machine-learning ideas to create the perceptron. It was initially planned as a machine, not a program. The software, designed for the IBM 704, was installed in a custom-built machine called the Mark 1 perceptron, which was constructed for image recognition. This made the software and the algorithms transferable and available for other machines. Although early results were promising, the perceptron could not recognize many kinds of visual patterns (such as faces).
In 1967, the nearest neighbor algorithm was conceived, beginning basic pattern recognition. The algorithm would go on to be used for mapping routes and finding the most efficient solutions for traveling salespersons. The introduction of the nearest neighbor rule has been credited to Marcello Pelillo. who, in turn, credits a Cover and Hart paper from 1967. During the 1960s, the use of multilayers also opened a new path in neural network research. Using two or more layers in the perceptron offered significantly more processing power. Multiple layers led to feedforward neural networks and backpropagation (developed during the 1970s).
During the 1980s, Gerald Dejong introduced the concept of explanation-based learning (1981), where a computer analyses training data and creates general rules to follow. And Terry Sejnowski invented NetTalk (1985), a program that learns to pronounce words the same way a baby does. During the 1990s, work on machine learning shifted to a data-driven approach, with scientists creating programs to analyze vast datasets and draw conclusions or "learn" from the results. In 1997, IBM's Deep Blue program beat the world champion at chess. The phrase "deep learning" was coined in 2006 by Geoffrey Hinton, explaining new algorithms capable of distinguishing objects and text in images and videos. In 2010, Microsoft Kinect used machine learning to track twenty human features at a rate of thirty times a second such that users could interact with the computer using movements and gestures.
The 2010s saw a number of advancements in the field of machine learning from large tech companies, including the following:
- IBM’s Watson beat human competitors at Jeopardy (2011)
- The development of Google Brain and its deep neural network that can learn to discover and categorize objects (2011)
- Google's X Lab's machine learning algorithm, which can autonomously identify YouTube videos containing a cat (2012)
- Facebook's DeepFace software, which can recognize or verify individuals in photos (2014)
- Google's AI, AlphaGo, beat a professional player at the Chinese board game Go (2016)
In 2017, a team of Google researchers released a paper titled "Attention Is All You Need," describing a new machine learning architecture based on attention mechanisms. Transformers demonstrate better results and require less compute to train compared to previous architectures. Transformers have become the leading machine learning architecture in AI, being used for cutting-edge LLMs.
- Automated theorem proving
- Adaptive websites
- Affective computing
- Brain–machine interfaces
- Classifying DNA sequences
- Computational anatomy
- Computer networks
- Computer vision, including object recognition
- Detecting credit card fraud
- Financial market analysis
- General game playing
- Information retrieval
- Internet fraud detection
- Machine learning control
- Machine perception
- Medical diagnosis
- Natural language processing
- Natural language understanding
- Optimization and Metaheuristic
- Online advertising
- Recommender systems
- Robot locomotion
- Search engines
- Sentiment analysis (or opinion mining)
- Sequence mining
- Software engineering
- Speech recognition
- Handwriting recognition
- Structural health monitoring
- Syntactic pattern recognition
- Time series forecasting
- User behavior analytics
AWS Machine Learning in Motion
Graph-Powered Machine Learning
Grokking Machine Learning
How Machine Learning Works
Mostafa Samir Abd El-Fattah
Human-in-the-Loop Machine Learning