Company attributes
Product attributes
Other attributes
Lilac is an open-source product that allows users to analyze, structure, and clean unstructured data with artificial intelligence (AI). It can be used from the product's UI or from Python. Lilac is a visual tool and a Python API enabling users to:
- Browse datasets with unstructured data
- Enrich unstructured fields with structured metadata using Lilac Signals; for example, near-duplicate and detecting personal information
- Create and refine Lilac Concept (customizable AI models) that can be used to find and score matching text
- Remove unwanted or problematic data based on the user's criteria
- Analyze patterns in data
- Download the results of the enrichment for downstream applications
Lilac aims to make unstructured data more visible, quantifiable, and useful, leading to higher-quality AI models, better actions when models fail, and better control and visibility of model bias.
First released on August 21, 2023, Lilac was developed by Daniel Smilkov and Nikhil Thorat, who founded the product's parent company, Lilac AI. Smilkov and Thorat previously worked together at Google, collaborating with many teams to improve datasets that were used for building AI models. Many AI models rely on unstructured data (e.g., natural language or images) that lack labels or useful metadata. Teams would often compute aggregate statistics to understand the composition of their data while overlooking the raw data. When organized and visualized, bugs in the datasets could be identified. These bugs may have relatively simple fixes that produce higher-quality AI models. While working at Google, Smilkov and Thorat developed tools and processes to help teams understand their data. Lilac was built based on their experiences at Google.
Lilac allows users to annotate their data using customizable Concepts, AI-powered embedding-based classifiers that are specific to an application. Concepts are created and refined through the UI and can be updated in real time with user feedback. Lilac is an out-of-the-box solution that comes with a set of generally useful Signals and Concepts, with plans to add more useful enrichments based on feedback from the open-source community.
A Hugging Face demo of Lilac is available with a number of popular datasets and curated concepts. The demo allows users to browse pre-enriched datasets and create their own concepts. The space can be forked and made private to incorporate the user's own data.