The technology described herein provides an automated software-testing platform that uses reinforcement learning to discover how to perform tasks used in testing. The technology described herein is able to perform quality testing even when prescribed paths to completing tasks are not provided. The reinforcement-learning agent is not directly supervised to take actions in any given situation, but rather learns which sequences of actions generate the most rewards through the observed states and rewards from the environment. In the software-testing environment, the state can be user interface features and actions are interactions with user interface elements. The testing system may recognize when a sought after state is achieved by comparing a new state to a reward criteria.