Sparse PCA

Other attributes

Wikidata ID

Sparse principal component analysis (Sparse PCA) is an extension of the classic principal component analysis (PCA) method that offers dimensionality reduction of data with better statistical properties and interpretability than classic PCA.

One of the disadvantages of classic PCA is that the principal components are linear combinations of all variables. In other words, the principal components depend on all of the original variables. Sparce PCA extends traditional PCA by finding linear combinations that contain only a few input variables.

For some problems, this means that Sparce PCA will produce similar results as traditional PCA, but with simpler and more interpretable components.

For example, in "A Direct Formulation for Sparse PCA Using Semidefinite Programming" (D'Aspremont et al. (2007)), 500 genes were measures for a large number of samples. With traditional PCA, the factors obtained each use all 500 genes, making the results difficult to interpret. Using Sparse PCA, the factors altogether only involved 14 genes and the data was more interpretable.