arXiv:2511.09146v1 Announce Type: new 
Abstract: Rotary Position Embedding (RoPE) in Transformer models has inherent limits that weaken length extrapolation. We reinterpret the attention map with positional encoding as a noisy feature map, and propose Denoising Positional Encoding (DoPE), a training-free method based on truncated matrix entropy to detect outlier frequency bands in the feature map. Leveraging the noise characteristics of the feature map, we further reparameterize it with a parameter-free Gaussian distribution to achieve robust extrapolation. Our method theoretically reveals the underlying cause of the attention sink phenomenon and its connection to truncated matrix entropy. Experiments on needle-in-a-haystack and many-shot in-context learning tasks demonstrate that DoPE significantly improves retrieval accuracy and reasoning stability across extended contexts (up to 64K tokens). The results show that the denoising strategy for positional embeddings effectively mitigates attention sinks and restores balanced attention patterns, providing a simple yet powerful solution for improving length generalization. Our project page is Project: https://The-physical-picture-of-LLMs.github.io

تقدم الورقة ترميز المواقع لإزالة الضوضاء (DoPE)، وهي طريقة جديدة لمعالجة قيود ترميز المواقع الدوارة (RoPE) في نماذج Transformer. من خلال استخدام انتروبيا المصفوفة المقطوعة، يحسن DoPE دقة الاسترجاع واستقرار التفكير في المهام التي تتضمن سياقات موسعة، مما يحسن الأداء بشكل كبير في السيناريوهات التي تصل إلى 64 ألف رمز. هذه الخطوة مهمة لتحسين تعميم الطول في نماذج الذكاء الاصطناعي.

El artículo presenta la Codificación Posicional de Denoising (DoPE), un método innovador para abordar las limitaciones de la Codificación Posicional Rotativa (RoPE) en modelos Transformer. Al utilizar la entropía de matriz truncada, DoPE mejora la precisión de recuperación y la estabilidad del razonamiento en tareas que involucran contextos extendidos, mejorando significativamente el rendimiento en escenarios de hasta 64K tokens. Este avance es crucial para mejorar la generalización de longitud en modelos de IA.

Cet article présente l'encodage de position par débruitage (DoPE), une méthode novatrice pour surmonter les limites de l'encodage de position rotatif (RoPE) dans les modèles Transformer. En utilisant l'entropie de matrice tronquée, DoPE améliore la précision de récupération et la stabilité du raisonnement dans des tâches impliquant des contextes étendus, améliorant significativement les performances dans des scénarios avec jusqu'à 64K tokens. Cette avancée est cruciale pour améliorer la généralisation de longueur dans les modèles d'IA.

The paper introduces Denoising Positional Encoding (DoPE), a novel method to address the limitations of Rotary Position Embedding (RoPE) in Transformer models. By utilizing truncated matrix entropy, DoPE enhances retrieval accuracy and reasoning stability in tasks involving extended contexts, significantly improving performance in scenarios with up to 64K tokens. This advancement is crucial for enhancing length generalization in AI models.

DoPE: Denoising Rotary Position Embedding

Here's everything you need to know about the latest smart ring by Oura, based on a wearable expert's real-world usage.

Is the $500 Oura Ring 4 Ceramic worth it? I wore one for a month, and here's my advice

Samsung's QN90F QLED delivers great streaming and gaming performance, making it a strong holiday value.

Why I recommend this Samsung QLED TV over pricier OLED models in 2025 - and don't regret it

<h1>
 
 
 AI Amnesia: Erasing Knowledge Without a Trace
</h1>

Imagine your AI model accidentally learned something it shouldn't have – sensitive customer data, for example. Current methods for deleting this information often require retraining the entire model, an expensive and time-consuming process. What if we could surgically remove that knowledge without starting from scratch?

The key lies in a novel approach: creating artificial "forgetting cues." We're teaching the model to unlearn specific data patterns by exposing it to carefully crafted synthetic examples. These examples are designed to strongly contradict the information we want the model to forget, effectively overwriting the problematic associations in its memory. This works even if you don’t have access to the original data you need to erase.

Think of it like this: you're trying to forget a bad song stuck in your head. Instead of trying to actively suppress it (which rarely works), you blast an even more catchy song. The new song overwrites the old one, effectively erasing it from your mental playlist.

Benefits of Selective Forgetting:

<ul>
<li> Enhanced Data Privacy: Remove sensitive data without compromising the overall model's performance.</li>
<li> Reduced Retraining Costs: Avoid full model retraining, saving significant time and resources.</li>
<li> Improved Model Security: Eliminate vulnerabilities introduced by unintentionally learned patterns.</li>
<li> Adaptable Learning: Enables continuous refinement of AI models based on evolving data landscapes.</li>
<li> Compliance Ready: Supports compliance with data privacy regulations like GDPR.</li>
<li> Scalable Solutions: Works efficiently even with limited access to training data.</li>
</ul>

Practical Tip: One challenge is ensuring the synthetic data accurately targets the information you want to remove without negatively impacting the model's ability to generalize. Rigorous testing and validation with holdout datasets are crucial.

The promise of AI that can truly 'forget' opens exciting possibilities for responsible AI development. By enabling precise data deletion, we pave the way for more secure, compliant, and adaptable machine learning systems. Imagine AI models that can adapt to changing ethical guidelines or quickly unlearn incorrect information, all without massive retraining efforts. This is a crucial step towards trustworthy and responsible AI that respects data privacy and aligns with societal values. Future exploration could include extending this to different data modalities and model architectures.

Related Keywords: machine unlearning, data privacy, few-shot learning, zero-shot learning, synthetic data, model editing, catastrophic forgetting, incremental learning, continual learning, deep learning, neural networks, data security, algorithmic fairness, responsible ai, ethical ai, federated unlearning, privacy-preserving ai, model retraining, AI governance, data deletion, GDPR compliance

يتناول المقال نهجًا جديدًا لمعالجة مشكلة نسيان الذكاء الاصطناعي، حيث قد تحتفظ نماذج الذكاء الاصطناعي عن غير قصد بمعلومات حساسة. تتطلب الطرق التقليدية لحذف هذه البيانات غالبًا إعادة تدريب كاملة للنموذج، وهو ما يعد مكلفًا ويستغرق وقتًا. تتضمن التقنية الجديدة إنشاء 'إشارات نسيان' اصطناعية تساعد النموذج على نسيان أنماط بيانات معينة من خلال تعريضه لأمثلة صناعية تتناقض مع المعلومات غير المرغوب فيها، مما يسمح بإزالة المعرفة المستهدفة دون الحاجة للوصول إلى البيانات الأصلية.

El artículo aborda un nuevo enfoque para tratar el problema de la amnesia de la IA, donde los modelos de IA pueden retener inadvertidamente información sensible. Los métodos tradicionales para eliminar estos datos a menudo requieren un reentrenamiento completo del modelo, lo que es costoso y lleva tiempo. La nueva técnica implica crear 'señales de olvido' artificiales que ayudan al modelo a desaprender patrones de datos específicos al exponerlo a ejemplos sintéticos que contradicen la información no deseada, permitiendo así la eliminación selectiva de conocimientos sin necesidad de acceder a l…

L'article traite d'une nouvelle approche pour résoudre le problème de l'amnésie de l'IA, où les modèles d'IA peuvent involontairement conserver des informations sensibles. Les méthodes traditionnelles pour supprimer ces données nécessitent souvent un réentraînement complet du modèle, ce qui est coûteux et long. La nouvelle technique consiste à créer des 'cues d'oubli' artificiels qui aident le modèle à désapprendre des schémas de données spécifiques en lui présentant des exemples synthétiques qui contredisent les informations indésirables, permettant ainsi une suppression ciblée des connaissan…

The article discusses a novel approach to address the issue of AI amnesia, where AI models may inadvertently retain sensitive information. Traditional methods for deleting such data often require complete retraining of the model, which is costly and time-consuming. The new technique involves creating artificial 'forgetting cues' that help the model unlearn specific data patterns by presenting it with synthetic examples that contradict the unwanted information, allowing for targeted knowledge removal without needing access to the original data.

AI Amnesia: Erasing Knowledge Without a Trace

Baseus' Enercore CG11 travel adapter is one of the better designed ones I've tested, although its best feature isn't immediately apparent.

Traveling soon? Why this one charger is the only one you'll ever need to pack

Turning materials like wood chips, crop residues and municipal solid waste into fuels and chemicals is important for our country's energy independence.

طور الباحثون نماذج حاسوبية متقدمة تهدف إلى تحسين التنبؤات لعمليات طحن الكتلة الحيوية. تركز هذه الابتكارات على تحويل مواد مثل رقائق الخشب، بقايا المحاصيل، والنفايات الصلبة البلدية إلى وقود ومواد كيميائية قيمة، وهو أمر مهم لتعزيز استقلالية الطاقة في البلاد. من المتوقع أن تعمل النماذج على تحسين كفاءة معالجة الكتلة الحيوية، مما يساهم في حلول الطاقة المستدامة.

Investigadores han desarrollado modelos informáticos avanzados destinados a mejorar las predicciones para los procesos de molienda de biomasa. Esta innovación se centra en convertir materiales como astillas de madera, residuos de cultivos y residuos sólidos municipales en combustibles y productos químicos valiosos, lo que es crucial para mejorar la independencia energética del país. Se espera que los modelos optimicen la eficiencia del procesamiento de biomasa, contribuyendo así a soluciones energéticas sostenibles.

Des chercheurs ont développé des modèles informatiques avancés visant à améliorer les prévisions pour les processus de broyage de la biomasse. Cette innovation se concentre sur la conversion de matériaux tels que les copeaux de bois, les résidus de culture et les déchets solides municipaux en combustibles et produits chimiques précieux, ce qui est crucial pour renforcer l'indépendance énergétique du pays. Les modèles devraient optimiser l'efficacité du traitement de la biomasse, contribuant ainsi à des solutions énergétiques durables.

Researchers have developed advanced computer models aimed at improving predictions for biomass milling processes. This innovation focuses on converting materials such as wood chips, crop residues, and municipal solid waste into valuable fuels and chemicals, which is crucial for enhancing energy independence in the country. The models are expected to optimize the efficiency of biomass processing, thereby contributing to sustainable energy solutions.

Researchers develop computer models for better biomass milling predictions

The most interesting aspect of the latest Even Realities glasses may be their limitations.

I've tried several AI smart glasses (including Meta Ray-Bans) in 2025 - these are the most comfortable

arXiv:2510.24021v2 Announce Type: replace 
Abstract: Knowledge distillation (KD) is a standard route to compress Large Language Models (LLMs) into compact students, yet most pipelines uniformly apply token-wise loss regardless of teacher confidence. This indiscriminate supervision amplifies noisy, high-entropy signals and is especially harmful under large teacher-student capacity gaps. We introduce SelecTKD, a plug-and-play Selective Token-Weighted distillation framework that shifts the focus from "how to measure divergence" to "where to apply learning". At each step, the student proposes tokens that are verified by the teacher through a robust propose-and-verify procedure with two variants: greedy Top-k and non-greedy Spec-k. Accepted tokens receive full loss, while rejected tokens are masked or down-weighted. This objective-agnostic design works with on- and off-policy data, induces an implicit curriculum quantified by Token Acceptance Rate (TAR), and stabilizes optimization. Across instruction following, mathematical reasoning, code generation, and a VLM setting, SelecTKD consistently improves strong baselines and achieves state-of-the-art results for small models without architectural changes or extra reference models.

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

arXiv:2511.13368v1 Announce Type: new 
Abstract: Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

arXiv:2511.11878v1 Announce Type: new 
Abstract: While large language models (LLMs) show transformative potential in healthcare, their development remains focused on high-resource languages, creating a critical barrier for others as simple translation fails to capture unique clinical and cultural nuances, such as endemic diseases. To address this, we introduce MedPT, the first large-scale, real-world corpus for Brazilian Portuguese, comprising 384,095 authentic question-answer pairs from patient-doctor interactions. The dataset underwent a meticulous multi-stage curation protocol, using a hybrid quantitative-qualitative analysis to filter noise and contextually enrich thousands of ambiguous queries. We further augmented the corpus via LLM-driven annotation, classifying questions into seven semantic types to capture user intent. Our analysis reveals its thematic breadth (3,200 topics) and unique linguistic properties, like the natural asymmetry in patient-doctor communication. To validate its utility, we benchmark a medical specialty routing task: fine-tuning a 1.7B parameter model achieves an outstanding 94\% F1-score on a 20-class setup. Furthermore, our qualitative error analysis shows misclassifications are not random but reflect genuine clinical ambiguities (e.g., between comorbid conditions), proving the dataset's deep semantic richness. We publicly release MedPT to foster the development of more equitable, accurate, and culturally-aware medical technologies for the Portuguese-speaking world.

DoPE: Denoising Rotary Position Embedding

Was this article worth reading? Share it