Embodiments can provide a computer implemented method for simulating transaction data using a reinforcement learning model including an intelligent agent, a policy engine, and an environment, the method including: providing, by a processor, standard customer transaction data representing a group of customers having similar transaction characteristics as a goal; conducting, by the intelligent agent, an action including a plurality of simulated transactions; comparing, by the environment, the action with the goal; providing a feedback, by the environment, the action based on a degree of similarity relative to the goal; and adjusting, by the policy engine, a policy based on the feedback; the step of conducting an action to the step of adjusting a policy are repeated until the degree of similarity is higher than a first predefined threshold.