Systems and methods are provided for training a machine learned model on a large number of devices, each device acquiring a local set of training data without sharing data sets across devices. The devices train the model on the respective device's set of training data. The devices communicate a parameter vector from the trained model asynchronously with a parameter server. The parameter server updates a master parameter vector and transmits the master parameter vector to the respective device.