The present disclosure relates to a method and device for audio generation. The method includes: obtaining a target rhythm, a target verse melody and a target chorus melody; configuring the target rhythm as a first audio track, the target verse melody as a second audio track, and the target chorus melody as a third audio track; generating a target audio by aligning start playing time of the first audio track, the second audio track and the third audio track to beat occurrence time of a first beat, a second beat and a third beat in a first metronome data respectively.