SBIR/STTR Award attributes
Navy air traffic controllers must learn to block out background conversations and focus on one target speaker. Charles River Analytics, working with OWT Global, is developing a prototype for Communications with Operational Context and Knowledge for Target Audio Identification Learning (COCKTAIL). COCKTAIL is an example of a training module that requires accurate speech recognition. COCKTAIL is based on our innovative grammar-assisted speech processing (GASP) approach, which uses linguistic grammars to model the language used in training scenarios and build speech models tailored to those training scenarios. Air traffic control language is highly idiosyncratic, so a state-of-the-art speech-to-text system (e.g., Facebook’s Wav2Vec2) has high word error rates (WERs) even on extremely clean synthetic speech data. Using GASP enables COCKTAIL to drastically reduce the WER. COCKTAIL also uses GASP to generate diverse synthetic speech by varying the content, phrasing, voices, rate, pitch, and accent. COCKTAIL includes a human-machine interface that interacts with the instructor and trainee to create the speech needed for the training application. For example, the instructor may want to vary the number of background conversations and increase the pitch, volume, and speed of one of the conversations to make it more distracting.