SBIR/STTR Award attributes
For this STTR Intelligent Automation, Inc. teams with researchers from University of Maryland, College Park to develop SLA, a self-learned agent system for collective human activities and events in aerial videos. Aerial video analytics often faces challenges such as low resolution, shadows, varied spatio-temporal dynamics, etc. The traditional methods depending on the object detection and tracking often fail due to these challenges. Although deep learning shows the promises in recent years, it needs large volume groundtruthed dataset for training, which is not always available. We formulate SLA as a reinforcement learning agent that interacts with a video over time. Each agent is trained to perform a specific task defined by users, and multiple agents can interact and communicate to perform more complex event detection. Compared with existing work[1], our solution has the following advantages: 1) SLA only needs simple webcast style text as the annotation instead of detailed labeled dataset; 2) For an event, our solution detects not only temporal boundaries, but also spatial attention; and 3) our solution will caption the detected activities by generating short phrases describing the event.