Sparse principal component analysis (Sparse PCA) is an extension of the classic principal component analysis (PCA) method that offers dimensionality reduction of data with better statistical properties and interpretability than classic PCA.
One of the disadvantages of classic PCA is that the principal components are linear combinations of all variables. In other words, the principal components depend on all of the original variables. Sparce PCA extends traditional PCA by finding linear combinations that contain only a few input variables.
For some problems, this means that Sparce PCA will produce similar results as traditional PCA, but with simpler and more interpretable components.
For example, in "A Direct Formulation for Sparse PCA Using Semidefinite Programming" (D'Aspremont et al. (2007)), 500 genes were measures for a large number of samples. With traditional PCA, the factors obtained each use all 500 genes, making the results difficult to interpret. Using Sparse PCA, the factors altogether only involved 14 genes and the data was more interpretable.
Everything you did and didn't know about PCA · Its Neuronal
How exactly is sparse PCA better than PCA?
Optimal Sparse Linear Auto-Encoders and Sparse PCA
Malik Magdon-Ismail, Christos Boutsidis
Sparse Principal Component Analysis
Hui Zou, Trevor Hastie, Robert Tibshirani
Sparse Principal Component Analysis: Algorithms and Applications
Documentaries, videos and podcasts
Sparse PCA in High Dimensions
December 18, 2013
- Principal component analysis (PCA)Technique used to find the most valuable parts of all of the variables in a dataset and to then transform the original variables into a smaller set of linear combinations.
- Kernel PCANon-linear form of principal component analysis (PCA) that better exploits the complicated spatial structure of high-dimensional features.
- Nonlinear dimensionality reduction (NDR or NLDR)A process of mapping higher-dimensional data into a lower-dimensional non-linear manifold within higher-dimensional space so that the data can be more easily visualized and interpreted.
- Unsupervised learningA branch of machine learning that tries to make sense of data that has not been labeled, classified, or categorized by extracting features and patterns on its own.
- Data scienceInterdisciplinary field about processes and systems to extract knowledge or insights from data