Dimensionality Reduction

Process of finding a low-dimensional representation of higher-dimensional data that retains as much important information from the data as possible.

Dimensionality reduction, or dimension reduction, is a general process of projecting a set of high-dimensional vectors to a lower-dimensionality space while retaining metrics among them. In other words, dimensionality reduction aims to downsize high-dimensional data so that it can be represented in low-dimensional space without losing important information from the data.

There are several reasons why dimensionality reduction can be useful:

  • Data visualization - It's difficult or even impossible for humans to visualize high-dimensional data. Dimensionality reduction can represent that high-dimensional data in 2D or 3D.
  • Data compression - Storage space and computing power are costly resources. Dimensionality reduction makes data more efficient to store and easier to retrieve.
  • Noise removal - Data can often be corrupted or distorted to the point that it's difficult/impossible to understand and interpret it. Dimensionality reduction can reduce noise in data and have a positive effect on query accuracy.
Dimensionality Reduction Techniques

Numerous techniques of data mining and machine learning can be categorized as processes of dimensionality reduction.


