Patent attributes
Subsets of training data are selected for iterations of a statistical model through a training process. The selection can reduce the amount of data to be processed by selecting the training data that will likely have significant training value for the pass. This can include using a metric such as the loss or certainty to sample the data, such that easy to classify instances are used for training less frequently than harder to classify instances. A cutoff value or threshold can also, or alternatively, be used such that harder to classify instances are not selected for training until later in the process when the model may be more likely to benefit from training on those instances. Sampling can vary between passes for variety, and the cutoff value might also change such that all data instances are eligible for training selection by at least the last iteration.