Patent attributes
A determination is made at a machine learning service that a training data set comprising a majority category of observation records and one or more minority categories of observation records meets a criterion for automated sampling. A sampling ratio to be used for a particular category of the majority category and the one or more minority categories is identified. A selected sampling methodology is applied to the particular category to obtain a sample in accordance with the sampling ratio. A particular machine learning model is trained using a result of applying at least the selected sampling methodology on the particular category.