Principal component analysis (PCA) is a technique of factor analysis which is used for feature extraction. Unlike common factor analysis, PCA considers the total variance in the data.

PCA finds the most valuable parts of each variable in a dataset and transforms the original variables into a smaller set of linear combinations, with each "new" variable being independent of the others. However, these new independent variables are generally less interpretable, so PCA is less effective when interpretability is important.

A common application of PCA is making data easier to explore and visualize by emphasizing variation and bringing out strong patterns in the data. It can be used to reduce the number of variables in a situation where it's difficult to identify variables to completely remove from consideration.

For example, take a situation in which the variables in a dataset have some overlap in the information they provide. If a dataset contains three independent variables, "temperature", "rainfall", and "humidity", there is likely some collinearity between those variables which adds to variance. By applying PCA, it's possible that one or two "new" variables, or principal components, could be created that contain all of the valuable parts of the original three variables and explain most of the variance in the data.

This technique is commonly termed as dimensionality reduction.

## Timeline

## People

## Further reading

A One-Stop Shop for Principal Component Analysis

Matt Brems

Web

Principal Component Analysis - Unsupervised Learning

Anish Singh Walia

Web