Embodiments of the present disclosure relate to a method and apparatus for speech recognition. The method includes: determining, based on an acoustic score of a speech frame in a speech signal, a non-silence frame in the speech signal; determining a buffer frame between adjacent non-silence frames based on the acoustic score of the speech frame, a modeling unit corresponding to the buffer frame characterizing a beginning or end of a sentence; and decoding a speech frame after removing the buffer frame from the speech signal, to obtain a speech recognition result.