Patent attributes
A system for outlier detection and removal comprises an interface and a processor. The interface is configured to receive a data set. The processor is configured to determine a cleaned data set by removing outliers, wherein determining the cleaned data set comprises determining a type of distribution, in response to the type of distribution being normal, determining the outliers using covariance estimation, in response to the type of distribution not being normal, determining the outliers using density based clustering, and determining the cleaned data set by removing the outliers from the data set, determine a coefficient of variation of the cleaned data set, determine whether the coefficient of variation is greater than a threshold coefficient of variation, and in response to the coefficient of variation being greater than the threshold coefficient of variation, determine a new cleaned data set by removing a new set of outliers from the cleaned data set.