Synthetic data

Synthetic data

Algorithmically generated information that imitates real data and can substitute for datasets used for testing and training in artificial intelligence.

Synthetic data is algorithmically generated information that imitates real data. Synthetic data can substitute for datasets used for testing and training in artificial intelligence (AI) and machine learning. To generate synthetic data, algorithms are fed with smaller real-world data and produce similar data. .

Using synthetic data is an approach to solving problems in AI that come from insufficient data by producing artificial data from scratch or producing novel and diverse training examples using data manipulation techniques. Synthetic data can provide a solution when data sets are too small or the cost of manually labeling data are prohibitively high. Synthetic datasets are cheaper to produce than traditional ones

Synthetically generated datasets can be used to train machine learning models, particularly in computer vision. Synthetic data my augment real datasets to cover parts of the data distribution that are not sufficiently represented to alleviate dataset bias. Synthetic data may also be useful when real data is impossible or prohibitively difficult to acquire due to privacy or legal issues. Synthetic data has been used to train Google’s Waymo in the form of driving simulations. Facebook was reported to use synthetic data to train algorithms to detect bullying language.

Timeline

People

Name
Role
LinkedIn

Further reading

Title
Author
Link
Type
Date

Deep learning with synthetic data will democratize the tech industry

Evan Nisselson

Web

May 11, 2018

Synthetic Data for Deep Learning

Sergey I. Nikolenko

September 25, 2019

Documentaries, videos and podcasts

Title
Date
Link

Companies

Company
CEO
Location
Products/Services

AI.Reverie

Daeil Kim

New York, New York

Synthetic data suites/APIs

Anyverse

Victor Gonzalez

Madrid, Spain

Synthetic data datasets

Michael P. Gregoire

New York, New York

Software

Ofir Chakon

Tel Aviv

Photorealistic synthetic data

Deep Vision Data

Agustin Caverzasi

Cincinnati

Synthetic training data

Gen Rocket

Garth Rose

Ojai, California

Test data

Harry Keen

London

Synthetic data

Amit Walia

Redwood City, California

Software development

Leslie Oliver Karpas

Brooklyn, New York

B2B API solutions

MDClone

Ziv Ofek

Be'er Sheva, HaDarom, Israel

Synthetic healthcare data

Michael Platzer

Vienna

Software/artificial intelligence

Yashar Behzadi

San Francisco, California

Enterprise AI solutions

Statice.ai

Sebastian Weyer

Berlin, Germany

Data analysis

Synthesis AI

Matthew Moore

San Francisco, California

Data generation platform

Ian Coe

San Francisco, California

Synthetic data for information security and privacy

Felix Marx

Dublin, Ireland

Data anonymization and analytics

Gonçalo Martins Ribeiro

Lisbon, Portugal

Data privacy AI

References

Golden logo
Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.