Patent 10268684 was granted and assigned to Amazon on April, 2019 by the United States Patent and Trademark Office.
Technologies are disclosed herein for statistical machine translation. In particular, the disclosed technologies include extensions to conventional machine translation pipelines: the use of multiple domain-specific and non-domain-specific dynamic language translation models and language models; cluster-based language models; and large-scale discriminative training. Incremental update technologies are also disclosed for use in updating a machine translation system in four areas: word alignment; translation modeling; language modeling; and parameter estimation. A mechanism is also disclosed for training and utilizing a runtime machine translation quality classifier for estimating the quality of machine translations without the benefit of reference translations. The runtime machine translation quality classifier is generated in a manner to offset imbalances in the number of training instances in various classes, and to assign a greater penalty to the misclassification of lower-quality translations as higher-quality translations than to misclassification of higher-quality translations as lower-quality translations.