Synthetic Clinical Notes for Rare ICD Codes: A Data-Centric Framework for Long-Tail Medical Coding

arXiv — cs.CLWednesday, November 19, 2025 at 5:00:00 AM
  • A new framework has been proposed to generate synthetic clinical notes for rare ICD codes, addressing the significant underrepresentation of these codes in existing datasets like MIMIC
  • The development is crucial as it aims to improve the accuracy of automatic ICD coding, which is vital for effective medical data processing. By expanding the representation of rare codes, this approach could lead to better healthcare analytics and improved patient outcomes.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Large Language Model-Based Generation of Discharge Summaries
PositiveArtificial Intelligence
Recent research has demonstrated the potential of Large Language Models (LLMs) in automating the generation of discharge summaries, which are critical documents in patient care. The study evaluated five models, including proprietary systems like GPT-4 and Gemini 1.5 Pro, and found that Gemini, particularly with one-shot prompting, produced summaries most similar to gold standards. This advancement could significantly reduce the workload of healthcare professionals and enhance the accuracy of patient information.