arXiv:2511.15986v2 Announce Type: replace-cross 
Abstract: Multimodal large language models (MLLMs) have shown strong potential for medical image reasoning, yet fairness across demographic groups remains a major concern. Existing debiasing methods often rely on large labeled datasets or fine-tuning, which are impractical for foundation-scale models. We explore In-Context Learning (ICL) as a lightweight, tuning-free alternative for improving fairness. Through systematic analysis, we find that conventional demonstration selection (DS) strategies fail to ensure fairness due to demographic imbalance in selected exemplars. To address this, we propose Fairness-Aware Demonstration Selection (FADS), which builds demographically balanced and semantically relevant demonstrations via clustering-based sampling. Experiments on multiple medical imaging benchmarks show that FADS consistently reduces gender-, race-, and ethnicity-related disparities while maintaining strong accuracy, offering an efficient and scalable path toward fair medical image reasoning. These results highlight the potential of fairness-aware in-context learning as a scalable and data-efficient solution for equitable medical image reasoning.

تسلط التطورات الأخيرة في نماذج اللغة متعددة الوسائط (MLLM) الضوء على أهمية العدالة في التفكير في الصور الطبية، كما يتضح من تقديم اختيار العرض الواعي للعدالة (FADS). تهدف هذه الطريقة إلى التخفيف من عدم التوازن الديموغرافي في تدريب النماذج من خلال استخدام أخذ العينات القائم على التجميع لإنشاء عروض متوازنة وذات صلة.

Los recientes avances en modelos de lenguaje multimodal (MLLM) destacan la importancia de la equidad en el razonamiento de imágenes médicas, como lo demuestra la introducción de la Selección de Demostración Consciente de la Equidad (FADS). Este método busca mitigar los desequilibrios demográficos en el entrenamiento de modelos mediante el uso de muestreo basado en agrupamiento para crear demostraciones equilibradas y relevantes.

Les avancées récentes dans les modèles de langage multimodaux (MLLM) soulignent l'importance de l'équité dans le raisonnement d'image médicale, comme le montre l'introduction de la Sélection de Démonstration Sensible à l'Équité (FADS). Cette méthode vise à atténuer les déséquilibres démographiques dans l'entraînement des modèles en utilisant un échantillonnage basé sur le clustering pour créer des démonstrations équilibrées et pertinentes.

Recent advancements in multimodal large language models (MLLMs) highlight the importance of fairness in medical image reasoning, as demonstrated by the introduction of Fairness-Aware Demonstration Selection (FADS). This method aims to mitigate demographic imbalances in model training by utilizing clustering-based sampling to create balanced and relevant demonstrations.

Fairness in Multi-modal Medical Diagnosis with Demonstration Selection

arXiv:2511.18084v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly adopted in clinical decision support, yet aligning them with the multifaceted reasoning pathways of real-world medicine remains a major challenge. Using more than 8,000 infertility treatment records, we systematically evaluate four alignment strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Group Relative Policy Optimization (GRPO), and In-Context Learning (ICL) through a dual-layer framework combining automatic benchmarks with blinded doctor-in-the-loop assessments. GRPO achieves the highest algorithmic accuracy across multiple decision layers, confirming the value of reinforcement-based optimization for structured prediction tasks. However, clinicians consistently prefer the SFT model, citing clearer reasoning processes (p = 0.035) and higher therapeutic feasibility (p = 0.019). In blinded pairwise comparisons, SFT attains the highest winning rate (51.2%), outperforming both GRPO (26.2%) and even physicians' original decisions (22.7%). These results reveal an alignment paradox: algorithmic improvements do not necessarily translate into higher clinical trust, and may diverge from human-centered preferences. Our findings highlight the need for alignment strategies that prioritize clinically interpretable and practically feasible reasoning, rather than solely optimizing decision-level accuracy.

أظهرت دراسة حديثة تقييم توافق نماذج اللغة الكبيرة (LLMs) في رعاية العقم، حيث تم تحليل أربع استراتيجيات: الضبط الدقيق الخاضع للإشراف (SFT)، تحسين التفضيلات المباشرة (DPO)، تحسين السياسة النسبية الجماعية (GRPO) والتعلم في السياق (ICL). أظهرت النتائج أن GRPO حقق أعلى دقة خوارزمية، بينما فضل الأطباء SFT لعملية تفكيرها الأكثر وضوحًا وقابلية تطبيقها العلاجية.

Un estudio reciente evaluó la alineación de los modelos de lenguaje de gran tamaño (LLMs) en el cuidado de la infertilidad, analizando cuatro estrategias: el ajuste fino supervisado (SFT), la optimización de preferencias directas (DPO), la optimización de políticas relativas de grupo (GRPO) y el aprendizaje en contexto (ICL). Los hallazgos revelaron que el GRPO logró la mayor precisión algorítmica, mientras que los clínicos preferían el SFT por su razonamiento más claro y su viabilidad terapéutica.

Une étude récente a évalué l'alignement des modèles de langage de grande taille (LLMs) dans les soins d'infertilité, en examinant quatre stratégies : l'affinage supervisé (SFT), l'optimisation des préférences directes (DPO), l'optimisation de la politique relative de groupe (GRPO) et l'apprentissage en contexte (ICL). Les résultats ont révélé que le GRPO atteignait la plus haute précision algorithmique, tandis que les cliniciens préféraient le SFT pour son raisonnement plus clair et sa faisabilité thérapeutique.

A recent study evaluated the alignment of large language models (LLMs) in infertility care, assessing four strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), Group Relative Policy Optimization (GRPO), and In-Context Learning (ICL). The findings revealed that GRPO achieved the highest algorithmic accuracy, while clinicians preferred SFT for its clearer reasoning and therapeutic feasibility.

The Alignment Paradox of Medical Large Language Models in Infertility Care: Decoupling Algorithmic Improvement from Clinical Decision-making Quality

arXiv:2511.17885v1 Announce Type: cross 
Abstract: Multimodal large language models (MLLMs) have achieved impressive performance, but high-resolution visual inputs result in long sequences of visual tokens and substantial inference latency. Reducing redundant visual tokens is critical to ease computational/memory burdens while preserving performance, enabling MLLM deployment in resource-constrained or latency-sensitive scenarios. Current visual token pruning methods mainly rely on attention-based redundancy analysis and are tailored to dense architectures. We propose Fast Multimodal Mixture-of-Experts (FastMMoE), a training-free acceleration framework for mixture-of-experts (MoE) based MLLMs, developed from a routing analysis perspective. FastMMoE combines two complementary strategies: (i) expert activation reduction for visual tokens to minimize unnecessary expert computation; and (ii) routing-aware token pruning that leverages similarity in routing probability distributions to identify and remove highly redundant visual tokens. Experiments on large-scale MoE-MLLMs such as DeepSeek-VL2 and InternVL3.5 demonstrate that FastMMoE can reduce FLOPs by up to 55.0% while retaining approximately 95.5% of the original performance, consistently outperforming dense-model pruning baselines including FastV and SparseVLM across multiple retention rates.

تم تقديم FastMMoE كإطار تسريع بدون تدريب لنماذج اللغة متعددة الوسائط (MLLMs)، حيث يتناول التحديات التي تطرحها المدخلات البصرية عالية الدقة التي تؤدي إلى تسلسلات طويلة من الرموز البصرية وزيادة في زمن الاستدلال. يستخدم هذا الإطار تقليل تنشيط الخبراء وتقليم الرموز الواعي بالتوجيه لتحسين الأداء دون المساس بالكفاءة.

FastMMoE se ha presentado como un marco de aceleración sin entrenamiento para modelos de lenguaje multimodal (MLLM), abordando los desafíos que plantean las entradas visuales de alta resolución que conducen a largas secuencias de tokens visuales y un aumento en la latencia de inferencia. Este marco emplea la reducción de activación de expertos y el pruning de tokens consciente del enrutamiento para optimizar el rendimiento sin comprometer la eficiencia.

FastMMoE a été présenté comme un cadre d'accélération sans entraînement pour les modèles de langage multimodaux (MLLM), répondant aux défis posés par les entrées visuelles haute résolution qui entraînent de longues séquences de jetons visuels et une latence d'inférence accrue. Ce cadre utilise la réduction de l'activation des experts et le pruning des jetons conscient du routage pour optimiser la performance sans compromettre l'efficacité.

FastMMoE has been introduced as a training-free acceleration framework for multimodal large language models (MLLMs), addressing the challenges posed by high-resolution visual inputs that lead to lengthy sequences of visual tokens and increased inference latency. This framework employs expert activation reduction and routing-aware token pruning to optimize performance without compromising efficiency.

Fairness in Multi-modal Medical Diagnosis with Demonstration Selection

Was this article worth reading? Share it

AIPortalX

Https

Augmeta