Patent attributes
A distributed system may implement identifying correlated events in a distributed system according to operational metrics. A distributed system may collect large numbers of operational metrics from multiple different sources. Some operational metrics may be monitored, analyzing the operational metrics for correlation with other operational metrics. The monitored operational metrics may be manually selected, or identified according to anomalous events detected for the operational metrics. Based on the monitoring, a correlated event may be detected. A response for the correlated event may be determined and performed. In some embodiments, a notification of the correlated event may be sent. Corrective actions may be performed at the distributed system, in some embodiments.

