Log in
Enquire now
‌

Towards Expert-Level Medical Question Answering with Large Language Models

OverviewStructured DataIssuesContributors

Contents

Is a
‌
Academic paper
0

Academic Paper attributes

arXiv ID
2305.096170
arXiv Classification
Computer science
Computer science
0
Publication URL
arxiv.org/pdf/2305.0...17.pdf0
Publisher
ArXiv
ArXiv
0
DOI
doi.org/10.48550/ar...05.096170
Paid/Free
Free0
Academic Discipline
Computer science
Computer science
0
Artificial Intelligence (AI)
Artificial Intelligence (AI)
0
Machine learning
Machine learning
0
Submission Date
May 16, 2023
0
Author Names
Rory Sayres0
Stephen Pfohl0
Sushant Prakash0
Tao Tu0
Vivek Natarajan0
Yossi Matias0
Yun Liu0
Alan Karthikesalingam0
...
Paper abstract

Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date
No Further Resources data yet.

References

Find more entities like Towards Expert-Level Medical Question Answering with Large Language Models

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Pricing
  • Become an Editor
  • Enterprise

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us

Explore companies

  • Artificial Intelligence
  • Fintech
  • Biotechnology
  • Cybersecurity
  • Semiconductors
  • Electric Vehicles
  • Cloud Computing
  • Robotics
  • SaaS
  • Renewable Energy
  • Venture Capital
  • Blockchain
  • Browse all →
By using this site, you agree to our Terms of Service.