Text to speech (TTS) is form of speech synthesis. It is a system that converts text into spoken voice output. TTS systems were initially used in reading systems for the blind in which the system reads some text from a book by converting it into speech. TTS applications include voice-enabled e-mail and spoken prompts in voice response systems. TTS is often used with voice recognition programs.
TTS system is built by creating a database of recorded voices (speaking whole sentences to syllables). The recordings are stored, sorted, labeled and segmented by phones, syllables, morphemes, words, phrases, and sentences. It will reproduce words from a text by carrying out a sophisticated linguistic analysis and natural language processing to understand the structure of the sentences and to determine the context of the word for pronunciation. After the natural language processing, the system will match the text to the database of speech units to produce speech fitted to the text input.
The Main Principles of Text-to-Speech Synthesis System
U.R. Aida–Zade, C. Ardil and A.M. Sharifova
- Speech synthesisArtificial simulation of human speech using computers or other devices
- Natural language processing (NLP)Natural language processing is a branch of artificial intelligence that is concerned with giving computers the ability to comprehend spoken words and text in the same way humans can.
- Machine translationSub-field of computational linguistics aiming to translate the documents or sentences using a machine learning model.