Patent 9280972 was granted and assigned to Microsoft on March, 2016 by the United States Patent and Trademark Office.
Embodiments that relate to converting audio inputs from an environment into text are disclosed. For example, in one disclosed embodiment a speech conversion program receives audio inputs from a microphone array of a head-mounted display device. Image data is captured from the environment, and one or more possible faces are detected from image data. Eye-tracking data is used to determine a target face on which a user is focused. A beamforming technique is applied to at least a portion of the audio inputs to identify target audio inputs that are associated with the target face. The target audio inputs are converted into text that is displayed via a transparent display of the head-mounted display device.