GRU is composed of two gates, a reset gate and an update gate. The reset gate combines the new input with the previous memory while the update gate defines how much of the previous memory to store.
GRU uses the basic idea of a gating mechanism to learn long-term dependencies same as in LSTM. The key differences are, GRU has two gates, an LSTM has three gates, it does not have an internal memory different from the exposed hidden state, it does not have an output gate, the input and forget gates are coupled by an update gate and the reset gate is directly applied to the previous hidden state.
Gated Recurrent Unit Neural Networks have shown success in various applications involving sequential or temporal data . It have been applied extensively in speech recognition, natural language processing, machine
translation among others.
Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks
Rahul Dey and Fathi M. Salem
Improving speech recognition by revising gated recurrent units
Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio