SimSUM: Simulated Benchmark with Structured and Unstructured Medical Records
NeutralArtificial Intelligence
- SimSUM has been introduced as a benchmark dataset comprising 10,000 simulated patient records that connect unstructured clinical notes with structured background variables, specifically in the context of respiratory diseases. The dataset aims to enhance clinical information extraction by incorporating tabular data generated from a Bayesian network, with clinical notes produced by a large language model, GPT-4o.
- This development is significant as it addresses the existing gap in open-source datasets that lack explicit links between structured features and clinical concepts, potentially improving the accuracy and efficiency of clinical information extraction in healthcare.
— via World Pulse Now AI Editorial System
