Patent attributes
Embodiments that relate to converting audio inputs from an environment into text are disclosed. For example, in one disclosed embodiment a speech conversion program receives audio inputs from a microphone array of a head-mounted display device. Image data is captured from the environment, and one or more possible faces are detected from image data. Eye-tracking data is used to determine a target face on which a user is focused. A beamforming technique is applied to at least a portion of the audio inputs to identify target audio inputs that are associated with the target face. The target audio inputs are converted into text that is displayed via a transparent display of the head-mounted display device.