Unsupervised learning is a branch of machine learning that takes unlabeled data that hasn't been previously classified or categorized and tries to extract features and patterns from the data on its own. Where supervised learning is analogous to taking a multiple choice test with pre-determined answer key, unsupervised learning is analogous to taking an open-ended test where the questions don't have an answer key or objective means of determining a grade.
The general goal of unsupervised learning is to gain some insights about a given data set by modeling the underlying structure or distribution in the data. Unsupervised learning algorithms aren't searching for concrete correct answers or specific outputs. Rather, they are handed a dataset without having any explicit instructions on what to do, and they are left alone to find interesting structure in the data.
The different unsupervised learning models that exist can be categorized based on the ways in which they organize data.
- Clustering - Identifying and grouping similar data points together. Variations include k-means, k-means++, hierarchical clustering, density clustering, spectral clustering, and more.
- Data compression / dimensionality reduction - Identifying and removing redundant data from a data set so that most of the important information can be represented with only a faction of the actual content, saving on computing power and storage costs. These methods include nonlinear dimensionality reduction (NDR), non-negative matrix factorization (NMF), singular value decompostion (SVD), as well as principal component analysis (PCA) and variations of PCA such as kernel PCA and sparse PCA.
- Anomaly detection - Identifying unusual patterns that do not conform to expected behavior. There are several types of anomaly detection that can be used for different purposes. They include: clustering-based methods such as k-means; support vector machine-based methods; density-based methods such as k-NN or local outlier factor (LOF).
- Association - Discovering interesting relationships between variables in large data sets. Well known association algorithms include Apriori and Eclat.
Clustering Based Unsupervised Learning
Syed Sadat Nazrul
Machine Learning for Humans, Part 3: Unsupervised Learning
NVIDIA Blog: Supervised Vs. Unsupervised Learning
Documentaries, videos and podcasts
- Artificial neural networkAn artificial neural network is a computer system that is modeled after the way the human brain analyzes and processes information.
- ClusteringA process of unsupervised learning in which similar data points are identified and grouped together in order to help profile the attributes of different groups.
- Dimensionality ReductionProcess of finding a low-dimensional representation of higher-dimensional data that retains as much important information from the data as possible.
- Principal component analysis (PCA)Technique used to find the most valuable parts of all of the variables in a dataset and to then transform the original variables into a smaller set of linear combinations.
- Non-negative matrix factorization (NMF)A matrix factorization method where all of the values in matrices are constrained to be non-negative so that they are easier to inspect. It is useful in data mining because it has the effect of clustering the input data.
- Show More