Various techniques are described for automatically suggesting variation parameters used to generate a tailored synthetic dataset to train a particular machine learning model. A seeding taxonomy associates a plurality of machine learning scenarios with corresponding subsets of variation parameters. A selected machine learning scenario is used to retrieve a corresponding subset of variation parameters associated with the selected machine learning scenario by the seeding taxonomy. The seeding taxonomy may be adaptable using a feedback loop that tracks selected variation parameters and updates the seeding taxonomy. The suggested variation parameters are presented as suggestions to assist users to identify and select relevant variation parameters faster and more efficiently. Further embodiments relate to pre-packaging synthetic datasets for common or anticipated machine learning scenarios. A user interface may present available packages of synthetic data for a selected industry sector and/or scenario, and a selected package may be made available for download.