Identifying anomalies or outliers in a set of data records employs a distance or similarity measure between features of record pairs that depends upon the frequencies of the feature values in the set. Feature distances may be combined for a total distance between record pairs. An outlier is indicated for a certain score that may be based upon the pairwise distances. Outliers may be employed to detect intrusions in computer networks.