Log in
Enquire now
Speech synthesis

Speech synthesis

Artificial simulation of human speech using computers or other devices

OverviewStructured DataIssuesContributors

Contents

Other attributes

Wikidata ID
Q16346

Speech synthesis is the artificial simulation of human speech using computers or other devices (speech computer or speech synthesizer). It is the counterpart of voice recognition. It is used to translate text information into audio information. It uses Text-to-speech systems (TTS) to convert text into audio form. It is used in applications of voice-enabled services and unified messaging.

It is also used in assistive technology to help vision-impaired individuals in reading text content, the contents of the display are automatically read aloud to the user.

Christian Kratzenstein, a Russian Professor, Physicist and pioneer in speech synthesis. He invented an apparatus based on the human vocal tract to produce of five long vowel sounds in 1779.

VODER, Voice Operating Demonstrator was the first fully functional voice synthesizer by Homer Dudley and was shown at the 1939 World's Fair. The VODER was based on Bell Laboratories' vocoder (voice coder) research of the mid-thirties.

The simple method of speech synthesis relies on a machine analyzing the words of input phrases and grouping letters based on common usage together. These letters are then matched to a specific sound in the machine's database, which creates the synthesized audio. In this version of speech synthesis, the machine is merely converting the most common sounds that letters make together into audio, which results in the uneven and robotic tones and odd mispronunciations present in simpler systems.

In order to introduce smooth and more natural speech patterns, modern speech synthesis systems have begun to deploy Hidden Markov models to determine the most likely phrase that needs to be "spoken" by the synthesizer. Hidden Markov models are finite state machines that can be used to analyze segments of text that are broken down into a series based on time. The state machine determines the actual word that has been typed using phonetic analysis and its place within the typed phrase based on probability. This allows the machine to string the sounds along in a more naturally paced manner that matches the intent of the text to the audio being produced.

HMM-Based Speech Synthesis: Fundamentals and Its Recent Advances

The four states of analysis to produce audio based on Hidden Markov models are text, phonetic, prosodic, and speech. Text analysis converts the text into a form usable by the machine and utilizes probability to determine the linguistic meaning of the text and the context of the text. Phonetic analysis converts the literal typed letters into phonetic symbols that the machine can relate to certain sounds. Prosodic analysis seeks to use the linguistic meaning in conjunction with the context and phonetic sounds to determine the most probable rhythms, stress patterns, and intonation. Speech analysis combines the results of the previous states to generate the speech signal.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

A Flexible Rule Compiler for Speech Synthesis

Wojciech Skut, Stefan Ulrich, Kathrine Hammervold

http://arxiv.org/abs/cs/0403039v1

Academic paper

A Short Introduction to Text-to-Speech Synthesis

Thierry Dutoit

http://tcts.fpms.ac.be/synthesis/introtts_old.html

Academic paper

Hidden Markov Model based Speech Synthesis: A Review

Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi

https://www.researchgate.net/publication/284139182_Hidden_Markov_Model_based_Speech_Synthesis_A_Review

Academic paper

History of Speech Synthesis (Wolfgang von Kempelen's speaking machine and its successors)

Hartmut Traunmüller

http://www2.ling.su.se/staff/hartmut/kemplne.htm

Academic paper

Progress in animation of an EMA-controlled tongue model for acoustic-visual speech synthesis

Ingmar Steiner INRIA Lorraine - LORIA, Slim Ouni INRIA Lorraine - LORIA

http://arxiv.org/abs/1201.4080v1

Academic paper

References

Find more entities like Speech synthesis

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.