Patent attributes
The current document is directed to an analysis subsystem within a large distributed computing system, such as a virtual data center or cloud-computing facility, that monitors the operational states associated with a multi-tiered application and provides useful information for determining one or more causes of various types of failures and undesirable operational states that may arise during operation of the multi-tiered application. In one implementation, the analysis subsystem collects metrics provided by various different types of metrics sources within the computational system and employs principal feature analysis to select a generally small subset of the collected metrics particularly relevant to monitoring a multi-tiered application and diagnosing underlying causes of operational states of the multi-tiered application. The analysis subsystem develops one or more conditional probability distributions with respect to the subset of metrics. These one or more conditional probability distributions, in turn, allow the analysis subsystem to provide useful information for analysis of the causes of failures and undesirable system states associated with the multi-tiered application.