Various systems, methods, and processes to perform recovery operations in a cluster based on exponential backoff models are disclosed. A node failure is detected. The node is one of multiple nodes in a cluster. In response to the node failure, an application executing on the node is failed over to another node in the cluster. In response to the detecting the node failure, recovery operations are automatically performed to determine whether the node is recovered. A subsequent recovery operation is performed after a prior recovery operation. The subsequent recovery operation is performed periodically based on a frequency that decreases exponentially after performing the prior recovery operation.