- Published on
A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation
- Authors
- Name
- Zhihong Chen
- Name
- Maya Varma
- Name
- Justin Xu
- Name
- Magdalini Paschali
- Name
- Dave Van Veen
- Name
- Andrew Johnston
- Name
- Alaa Youssef
- Name
- Louis Blankemeier
- Name
- Christian Bluethgen
- Name
- Stephan Altmayer
- Name
- Jeya Maria Jose Valanarasu
- Name
- Mohamed Siddig Eltayeb Muneer
- Name
- Eduardo Pontes Reis
- Name
- Joseph Paul Cohen
- Name
- Cameron Olsen
- Name
- Tanishq Mathew Abraham
- Name
- Emily B. Tsai
- Name
- Christopher F. Beaulieu
- Name
- Jenia Jitsev
- Name
- Sergios Gatidis
- Name
- Jean-Benoit Delbrouck
- Name
- Akshay S. Chaudhari
- Name
- Curtis P. Langlotz
- Affiliation
Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development and real-world evaluation. To overcome these challenges, we constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent). We systematically demonstrated competitive performance across eight distinct task types on our novel evaluation benchmark (CheXbench). Beyond technical validation, we assessed the real-world utility of CheXagent in directly drafting radiology reports. Our clinical assessment with eight radiologists revealed a 36% time saving for residents using CheXagent-drafted reports, while attending radiologists showed no significant time difference editing resident-drafted or CheXagent-drafted reports. The CheXagent-drafted reports improved the writing efficiency of both radiology residents and attending radiologists in 81% and 61% of cases, respectively, without loss of quality. Overall, we demonstrate that CheXagent can effectively perform a variety of CXR interpretation tasks and holds potential to assist radiologists in routine clinical workflows.