Whisper (OpenAI)

openai.com/blog/whisper/

Is a

‌

AI Project

AI Project attributes

Industry

Speech recognition

Artificial Intelligence (AI)

AI Project Parent Organization

OpenAI

Other attributes

Competitors

Massively Multilingual Speech (MMS)

Launch Date

September 21, 2022

Whisper is an automatic speech recognition (ASR) system that is approaching human levels of accuracy for the English language. The model is trained on 680,000 hours of multilingual and multitask supervised data collected from the internet. Using such a large training dataset helps Whisper improve its robustness to accents, background noise and technical language, enabling transcription in multiple languages as well as translating from various languages into English.

Whisper's architecture uses an end-to-end approach implemented as an encoder-decoder transformer. Audio is divided into 30-second chunks before being converted into a log-Mel spectrogram and then passed into an encoder. A decoder is trained to predict the corresponding text caption as well as perform specific tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation. While specialized models show better speech recognition performance, using a large and diverse dataset allows Whisper to be used for a variety of tasks with fewer errors. Roughly a third of Whisper's audio dataset is non-English. OpenAI open-sourced the Whisper model and inference code.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

Introducing Whisper

https://openai.com/blog/whisper/

Web

September 21, 2022

Whisper (OpenAI)

Contents

AI Project attributes

Other attributes

Timeline

Further Resources

References

Find more entities like Whisper (OpenAI)