Principal component analysis (PCA) is a technique of factor analysis which is used for feature extraction. Unlike common factor analysis, PCA considers the total variance in the data.
PCA finds the most valuable parts of each variable in a dataset and transforms the original variables into a smaller set of linear combinations, with each "new" variable being independent of the others. However, these new independent variables are generally less interpretable, so PCA is less effective when interpretability is important.
A common application of PCA is making data easier to explore and visualize by emphasizing variation and bringing out strong patterns in the data. It can be used to reduce the number of variables in a situation where it's difficult to identify variables to completely remove from consideration.
For example, take a situation in which the variables in a dataset have some overlap in the information they provide. If a dataset contains three independent variables, "temperature", "rainfall", and "humidity", there is likely some collinearity between those variables which adds to variance. By applying PCA, it's possible that one or two "new" variables, or principal components, could be created that contain all of the valuable parts of the original three variables and explain most of the variance in the data.
This technique is commonly termed as dimensionality reduction.
A One-Stop Shop for Principal Component Analysis
Principal Component Analysis - Unsupervised Learning
Anish Singh Walia
Documentaries, videos and podcasts
Lecture 15 | Machine Learning (Stanford)
July 22, 2008
- Factor analysisA method for modeling observed variables and their variance/covariance structures in terms of a smaller number of underlying, unobservable factors.
- Kernel PCANon-linear form of principal component analysis (PCA) that better exploits the complicated spatial structure of high-dimensional features.
- Sparse PCAAn extension of the classic principal component analysis (PCA) method that offers dimensionality reduction of data with better statistical properties and interpretability than classic PCA.
- Machine learningA field of computer science enabling computers to learn.
- Unsupervised learningA branch of machine learning that tries to make sense of data that has not been labeled, classified, or categorized by extracting features and patterns on its own.