Patent attributes
The technology disclosed includes a system to perform multi-label support vector machine (SVM) classification of a document. The system creates document features representing frequencies or semantics of words in the document. Trained SVM classification parameters for a plurality of labels are applied to the document features for the document. The system determines positive and negative distances between SVM hyperplanes for the labels and the feature vector. Labels with positive distance to the feature vector are harvested. When the distribution of negative distances is characterized by a mean and standard deviation, the system further harvests the labels with a negative distance such that the harvested labels include the labels with a negative distance between the mean negative distance and zero and separated from the mean negative distance by a predetermined first number of standard deviations.