Golden
Sparse PCA

Sparse PCA

An extension of the classic principal component analysis (PCA) method that offers dimensionality reduction of data with better statistical properties and interpretability than classic PCA.

Sparse principal component analysis (Sparse PCA) is an extension of the classic principal component analysis (PCA) method that offers dimensionality reduction of data with better statistical properties and interpretability than classic PCA.



One of the disadvantages of classic PCA is that the principal components are linear combinations of all variables. In other words, the principal components depend on all of the original variables. Sparce PCA extends traditional PCA by finding linear combinations that contain only a few input variables.



For some problems, this means that Sparce PCA will produce similar results as traditional PCA, but with simpler and more interpretable components.



For example, in "A Direct Formulation for Sparse PCA Using Semidefinite Programming" (D'Aspremont et al. (2007)), 500 genes were measures for a large number of samples. With traditional PCA, the factors obtained each use all 500 genes, making the results difficult to interpret. Using Sparse PCA, the factors altogether only involved 14 genes and the data was more interpretable.









Timeline

People

Name
Role
LinkedIn

Hui Zou

Creator



Robert Tibshirani

Creator



Trevor Hastie

Creator



Further reading

Title
Author
Link
Type
Date

Everything you did and didn't know about PCA · Its Neuronal

Alex Williams

Web



Optimal Sparse Linear Auto-Encoders and Sparse PCA

Malik Magdon-Ismail, Christos Boutsidis

Academic paper



Sparse Principal Component Analysis

Hui Zou, Trevor Hastie, Robert Tibshirani





Sparse Principal Component Analysis: Algorithms and Applications

Youwei Zhang

Academic paper



Documentaries, videos and podcasts

Title
Date
Link

Sparse PCA in High Dimensions

December 18, 2013

Companies

Company
CEO
Location
Products/Services









References