Methods and systems for a reinforcement learning system. A spatial and temporal representation of an observed state of an environment is encoded. A previous state is estimated from a given state and a size of a reward is adjusted based on a difference between the estimated previous state and the previous state.