Dimensionality reduction, or dimension reduction, is a general process of projecting a set of high-dimensional vectors to a lower-dimensionality space while retaining metrics among them. In other words, dimensionality reduction aims to downsize high-dimensional data so that it can be represented in low-dimensional space without losing important information from the data.
There are several reasons why dimensionality reduction can be useful:
- Data visualization - It's difficult or even impossible for humans to visualize high-dimensional data. Dimensionality reduction can represent that high-dimensional data in 2D or 3D.
- Data compression - Storage space and computing power are costly resources. Dimensionality reduction makes data more efficient to store and easier to retrieve.
- Noise removal - Data can often be corrupted or distorted to the point that it's difficult/impossible to understand and interpret it. Dimensionality reduction can reduce noise in data and have a positive effect on query accuracy.
Dimensionality Reduction Techniques
Numerous techniques of data mining and machine learning can be categorized as processes of dimensionality reduction.
- Non-negative matrix factorization (NMF)
- Principal component analysis (PCA)
- Kernel PCA
- Independent component analysis (ICA)
- Nonlinear dimensionality reduction (NDR)
- Linear discriminant analysis (LDA)
- Factor analysis
- Many others
Comprehensive Guide to 12 Dimensionality Reduction Techniques
Dimensionality Reduction For Dummies -- Part 1: Intuition
Documentaries, videos and podcasts
Dimensionality Reduction - The Math of Intelligence #5
- Unsupervised learningA branch of machine learning that tries to make sense of data that has not been labeled, classified, or categorized by extracting features and patterns on its own.
- Nonlinear dimensionality reduction (NDR or NLDR)A process of mapping higher-dimensional data into a lower-dimensional non-linear manifold within higher-dimensional space so that the data can be more easily visualized and interpreted.
- Principal component analysis (PCA)Technique used to find the most valuable parts of all of the variables in a dataset and to then transform the original variables into a smaller set of linear combinations.
- Data mining
- Factor analysisA method for modeling observed variables and their variance/covariance structures in terms of a smaller number of underlying, unobservable factors.
- Show More