Patent 10133739 was granted and assigned to Google on November, 2018 by the United States Patent and Trademark Office.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for neural translation systems with rare word processing. One of the methods is a method training a neural network translation system to track the source in source sentences of unknown words in target sentences, in a source language and a target language, respectively and includes deriving alignment data from a parallel corpus, the alignment data identifying, in each pair of source and target language sentences in the parallel corpus, aligned source and target words; annotating the sentences in the parallel corpus according to the alignment data and a rare word model to generate a training dataset of paired source and target language sentences; and training a neural network translation model on the training dataset.