Log in
Enquire now
‌

Generative Benchmark Creation for Table Union Search

OverviewStructured DataIssuesContributors

Contents

Is a
‌
Academic paper
0

Academic Paper attributes

arXiv ID
2308.038830
arXiv Classification
Computer science
Computer science
0
Publication URL
arxiv.org/pdf/2308.0...83.pdf0
Publisher
ArXiv
ArXiv
0
DOI
doi.org/10.48550/ar...08.038830
Paid/Free
Free0
Academic Discipline
Database
Database
0
Machine learning
Machine learning
0
Computer science
Computer science
0
Submission Date
August 7, 2023
0
Author Names
Aamod Khatiwada0
Roee Shraga0
Renée J. Miller0
Koyena Pal0
Paper abstract

Data management has traditionally relied on synthetic data generators to generate structured benchmarks, like the TPC suite, where we can control important parameters like data size and its distribution precisely. These benchmarks were central to the success and adoption of database management systems. But more and more, data management problems are of a semantic nature. An important example is finding tables that can be unioned. While any two tables with the same cardinality can be unioned, table union search is the problem of finding tables whose union is semantically coherent. Semantic problems cannot be benchmarked using synthetic data. Our current methods for creating benchmarks involve the manual curation and labeling of real data. These methods are not robust or scalable and perhaps more importantly, it is not clear how robust the created benchmarks are. We propose to use generative AI models to create structured data benchmarks for table union search. We present a novel method for using generative models to create tables with specified properties. Using this method, we create a new benchmark containing pairs of tables that are both unionable and non-unionable but related. We thoroughly evaluate recent existing table union search methods over existing benchmarks and our new benchmark. We also present and evaluate a new table search methods based on recent large language models over all benchmarks. We show that the new benchmark is more challenging for all methods than hand-curated benchmarks, specifically, the top-performing method achieves a Mean Average Precision of around 60%, over 30% less than its performance on existing manually created benchmarks. We examine why this is the case and show that the new benchmark permits more detailed analysis of methods, including a study of both false positives and false negatives that were not possible with existing benchmarks.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date
No Further Resources data yet.

References

Find more entities like Generative Benchmark Creation for Table Union Search

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Pricing
  • Become an Editor
  • Enterprise

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us

Explore companies

  • Artificial Intelligence
  • Fintech
  • Biotechnology
  • Cybersecurity
  • Semiconductors
  • Electric Vehicles
  • Cloud Computing
  • Robotics
  • SaaS
  • Renewable Energy
  • Venture Capital
  • Blockchain
  • Browse all →
By using this site, you agree to our Terms of Service.