Patent attributes
A loop instruction, at least one target instruction, and an associated trigger address are cached during loop entry. During each loop iteration, the processor predicts whether the loop will be taken or not-taken in a subsequent iteration. When pre-fetch of the cached loop instruction is subsequently detected (i.e., by comparing the trigger address with the current program counter value), the loop taken/not-taken prediction is used to fetch either loop body instructions (when predicted taken) or fall-through instructions (when predicted not-taken). The cached loop instruction is then executed and the loop taken/not-taken prediction is verified using a dedicated loop execution circuit while a penultimate loop body instruction is executed in the processor execution stage (pipeline). When a previous loop taken prediction is verified, the cached target instruction is executed, and then the fetched loop body instructions are executed. When a loop not-taken prediction is verified, the fetched fall-through instructions are executed.