SBIR/STTR Award attributes
Typically, learning visual feature representations for video analysis is non-trivial and requires a large number of training samples and a proper generalization framework to be effective. Many of the current state of the art methods for video captioning or video action description rely on encoding mechanisms executed through recurrent neural networks to encode temporal visual information extracted from the video data. These methods are computationally expensive and are not able to operate in real-time or with many object interactions within the framework of the video. However, recent research by The Ohio State University has developed a methodology for extracting and analyzing video frameworks in a real-time highly accurate manner. In this paper, Ubihere is proposing building upon that research to create a commercial implementation of this artificial intelligence research to develop a high impact solution for relevant Air Force operations. There are a very large number of existing functions/operations/actions within the DoD which would benefit from a highly capable easy-to-train visual based artificial intelligence platform. In fact, the Air Force has specifically published several Focus Areas seeking innovative technology such as the one proposed here by Ubihere.