SBIR/STTR Award attributes
As adversaries become more intelligent and adaptive, the U.S. military must accelerate its strategic and tactical decision making. The requirement to produce timely and accurate decisions in uncertain dynamic environments necessitates artificially intelligent agents that can be used to enhance the speed and accuracy of human decision makers; automate the generation, evaluation, and execution of mission plans; and improve effectiveness and reduce staffing needs of virtual environments used for training and mission rehearsals. Current artificial intelligence (AI) agents lack cognitive skills, including exhibiting human-like curiosity, biases, and errors; learning complex tasks quickly with limited feedback; coordinating with human or AI “teammates;” and are unable to continue functioning well during scenarios/operations that span long time horizons. Recently, deep reinforcement learning (DRL) models have allowed a wide range of these complex decision-making tasks to be solved, reaching performance levels comparable to or surpassing that of human experts. However, many state-of-the-art DRL methods have technical limitations restricting their use in real-world applications, including high sample complexity, lack of meta-reasoning, limited coordination awareness, and narrow counterfactual reasoning. Aptima and Cubic Corp. propose to develop ENDGAME (Exploring Neuroscientifically Derived Gameplaying Agents for MMO Environments): a framework for and implementation of adaptive, intelligent, cooperative agents. The ENDGAME solution is based on our adaptation of the active inference theory, called Deep Active Inference. Our Deep Active Inference framework provides the objective functions and scalable computational mechanisms that enable ENDGAME agents to execute four processes fundamental to human cognition: learning, perception, planning, and simulation. Further, it provides tractable Bayesian computations to learn causal environment representations and agent behavior models, estimate hidden states of the world, and construct long-term plans using forward (generative) models to simulate alternative counterfactual versions of the past and future. Because active inference allows for the evaluation of internal uncertainty, our agents will be able to seek novel experiences, learn about and interact with their teammates, and combine prior knowledge with reasoning even with sparse feedback. Several recent computational studies showed that active inference achieves superior performance in uncertain dynamic environments compared to standard and deep reinforcement learning baselines, attaining higher utility earlier in the learning process and adapting faster to changing environments. By leveraging active inference, ENDGAME agents will offer more natural human-like behaviors, increased planning depth and decision robustness, optimized learning and coordination, and a faster development and validation cycle.

