A node failure recovery tool includes an interface and one or more processors. The interface is configured to receive a first portion and a second portion of state information from a first node. The one or more processors are configured to determine a time that the first portion of state information was received and store the first portion of state information and the time that the first portion of state information was received. The one or more processors are further configured to determine a time that the second portion of state information was received and start a timer, determine that the timer has expired and that the third portion of state information has not been received, and after determining that the first node has crashed, send a retrieved second portion of state information to the first node so that the first node can recover from the crash.