Described is a technology by which speech is locally and remotely recognized in a hybrid way. Speech is input and recognized locally, with remote recognition invoked if locally recognized speech data was not confidently recognized. The part of the speech that was not confidently recognized is sent to the remote recognizer, along with any confidently recognized text, which the remote recognizer may use as context data in interpreting the part of the speech data that was sent. Alternative text candidates may be sent instead of corresponding speech to the remote recognizer.