arXiv:2510.05862v2 Announce Type: replace 
Abstract: Long-context models (LCMs) have demonstrated great potential in processing long sequences, facilitating many real-world applications. The success of LCMs can be attributed to their ability to locate implicit critical information within the context for further prediction. However, recent research reveals that LCMs are often susceptible to contextual noise, i.e., irrelevant tokens, that can mislead model attention. In this paper, we conduct a fine-grained analysis of the context noise and propose an effective metric, the Integrated Gradient (IG) score, to detect and quantify the noise information within the context. Our findings reveal that even simple mitigation of detected context noise can substantially boost the model's attention on critical tokens and benefit subsequent predictions. Building on this insight, we propose Context Denoising Training (CDT), a straightforward yet effective training strategy that improves attention on critical tokens while reinforcing their influence on model predictions. Extensive experiments across four tasks, under both context window scaling and long-context alignment settings, demonstrate the superiority of CDT. Notably, when trained with CDT, an open-source 8B model can achieve performance (50.92) comparable to GPT-4o (51.00).

تناقش هذه المقالة التقدم في نماذج السياق الطويل (LCMs) وفعاليتها في التعامل مع التسلسلات الطويلة. تسلط الضوء على قدرتها على تحديد المعلومات الحاسمة للتنبؤات، بينما تتناول أيضًا التحديات التي يطرحها الضجيج السياقي الذي يمكن أن يؤثر على أداء النموذج.

Este artículo discute los avances en los modelos de contexto largo (LCMs) y su efectividad para manejar secuencias largas. Destaca su capacidad para identificar información crucial para las predicciones, al tiempo que aborda los desafíos que plantea el ruido contextual que puede afectar el rendimiento del modelo.

Cet article traite des avancées des modèles à long contexte (LCMs) et de leur efficacité à gérer de longues séquences. Il met en lumière leur capacité à identifier des informations cruciales pour les prédictions tout en abordant les défis posés par le bruit contextuel qui peut affecter les performances du modèle.

This article discusses the advancements in long-context models (LCMs) and their effectiveness in handling lengthy sequences. It highlights their ability to identify crucial information for predictions while also addressing the challenges posed by contextual noise that can affect model performance.

Revisiting Long-context Modeling from Context Denoising Perspective

Was this article worth reading? Share it

LucidQuery AI

Airparser

Humanize AI

Cont3xt.dev

AIPortalX

Meteoria

Ready to build your own newsroom?