Archetypal analysis (AA) is a methodology in statistics and unsupervised learning that represents each "individual" in a data set as a mixture of "individuals of pure type", or "archetypes." Computing the archetypes is a nonlinear least squares problem which is solved using an alternative minimizing algorithm.
Archetypal analysis was originally proposed by Adele Cutler and Leo Breiman as an alternative to principal component analysis (PCA) for discovering latent factors for high-dimensional data. AA estimates the principal convex hull of a data set, and each "archetype" (i.e. factor) is forced to be a convex combination of extremal points of the data. The associations between archetypes and data points contributes to AA's results being easily interpretable.
The archetypal analysis methodology allows for dimensionality reduction and clustering. The disadvantage of AA is that its computation costs increase quadratically with the number of data points in a set, making it impractical for most problems. However, robust and efficient algorithms have been developed with practical applications in physics, genetics and phytomedicine, market research and marketing, performance evaluation, behavior analysis, as well as computer vision.
Adele Cutler, Leo Breiman
Archetypal analysis for machine learning and data mining
Morten Morup, Lars Kai Hansen
Making Archetypal Analysis Practical
Christian Bauckhage, Christian Thurau
Documentaries, videos and podcasts
- Principal component analysis (PCA)Technique used to find the most valuable parts of all of the variables in a dataset and to then transform the original variables into a smaller set of linear combinations.
- Non-negative matrix factorization (NMF)A matrix factorization method where all of the values in matrices are constrained to be non-negative so that they are easier to inspect. It is useful in data mining because it has the effect of clustering the input data.
- Non-negative matrix factorization via archetypal analysisAn approach to non-negative matrix factorization that does not require data to be separable and provides a generally unique decomposition.
- Computer VisionComputing field for recognizing information from images and videos
- Machine learningA field of computer science enabling computers to learn.