The present invention relates to a system and method for bootstrapping templates for use in natural language sentence generation. More specifically, the present invention relates to identifying a set of candidate sentences from a large corpus based on a set of original templates by using a similarity measure. The set of candidate sentences are then processed or cleaned to generate a set of templates for use in natural language sentence generation.