Patent attributes
A controller at a source site generates a set of tasks associated with a replication job. Each task involves a source worker node from among a set of source worker nodes at the source site, a destination worker node from among a set of destination worker nodes at the destination site, and includes one or more of copying an object from the source to destination site, or deleting an object from the destination site. Status update messages concerning the tasks are received at a message queue connected between the controller and the set of source worker nodes. The status update messages are logged into a persistent key-value store. Upon a failure to complete the replication job, the key-value store is accessed to identify tasks that were and were not completed before the failure. The tasks that were not completed are resent to the source worker nodes.