Principal component analysis (PCA)

Principal component analysis (PCA)

Technique used to find the most valuable parts of all of the variables in a dataset and to then transform the original variables into a smaller set of linear combinations.

Edit ID  10195541 

Daniel Frumkin"Added description, article, categories, related topics, and resources."
Daniel Frumkin edited on 8 March, 2019 12:19 pm
Edits made to:
Description (+171 characters)
Article (+1394 characters)
Further reading (+2 rows) (+8 cells) (+304 characters)
Categories (+2 topics)
Related Topics (+5 topics)
Topic thumbnail

Principal component analysis Principal component analysis (PCA)

Technique used to find the most valuable parts of all of the variables in a dataset and to then transform the original variables into a smaller set of linear combinations.

Article

Principal component analysis (PCA) is a technique of factor analysis which is used for feature extraction. Unlike common factor analysis, PCA considers the total variance in the data.

PCA finds the most valuable parts of each variable in a dataset and transforms the original variables into a smaller set of linear combinations, with each "new" variable being independent of the others. However, these new independent variables are generally less interpretable, so PCA is less effective when interpretability is important.

A common application of PCA is making data easier to explore and visualize by emphasizing variation and bringing out strong patterns in the data. It can be used to reduce the number of variables in a situation where it's difficult to identify variables to completely remove from consideration.

For example, take a situation in which the variables in a dataset have some overlap in the information they provide. If a dataset contains three independent variables, "temperature", "rainfall", and "humidity", there is likely some collinearity between those variables which adds to variance. By applying PCA, it's possible that one or two "new" variables, or principal components, could be created that contain all of the valuable parts of the original three variables and explain most of the variance in the data.

This technique is commonly termed as dimensionality reduction.

Further reading

Title
Author
Link
Type

Principal Component Analysis - Unsupervised Learning

Anish Singh Walia

Web

Categories

Related Topics

Golden logo
Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.