arXiv:2511.10070v1 Announce Type: new 
Abstract: We present ADI-20, an extension of the previously published ADI-17 Arabic Dialect Identification (ADI) dataset. ADI-20 covers all Arabic-speaking countries' dialects. It comprises 3,556 hours from 19 Arabic dialects in addition to Modern Standard Arabic (MSA). We used this dataset to train and evaluate various state-of-the-art ADI systems. We explored fine-tuning pre-trained ECAPA-TDNN-based models, as well as Whisper encoder blocks coupled with an attention pooling layer and a classification dense layer. We investigated the effect of (i) training data size and (ii) the model's number of parameters on identification performance. Our results show a small decrease in F1 score while using only 30% of the original training data. We open-source our collected data and trained models to enable the reproduction of our work, as well as support further research in ADI.

ADI-20: Arabic Dialect Identification dataset and models

Although Black Friday is still two weeks away, you can find great Nintendo Switch and Switch 2 deals now. I've collected the best from Walmart, Best Buy, and more.

مع اقتراب يوم الجمعة السوداء بعد أسبوعين، تتوفر بالفعل عروض مبكرة على أجهزة نينتندو سويتش وسويتش 2. تقدم متاجر كبيرة مثل وول مارت وبيست باي أكثر من 20 عرضًا، مما يوفر للمستهلكين فرصة لتوفير المال على منتجات الألعاب الشهيرة قبل موسم التسوق للعطلات.

A medida que se acerca el Black Friday en dos semanas, ya están disponibles ofertas anticipadas en las consolas Nintendo Switch y Switch 2. Grandes minoristas como Walmart y Best Buy están ofreciendo más de 20 ventas, brindando a los consumidores la oportunidad de ahorrar en productos de videojuegos populares antes de la locura de compras navideñas.

À l'approche de Black Friday dans deux semaines, des offres anticipées sur les consoles Nintendo Switch et Switch 2 sont déjà disponibles. Des détaillants majeurs comme Walmart et Best Buy proposent plus de 20 ventes, offrant aux consommateurs l'occasion d'économiser sur des produits de jeu populaires avant la ruée des achats de vacances.

As Black Friday approaches in two weeks, early deals on Nintendo Switch and Switch 2 consoles are already available. Major retailers like Walmart and Best Buy are offering over 20 sales, providing consumers with an opportunity to save on popular gaming products ahead of the holiday shopping rush.

Best early Black Friday Nintendo Switch deals 2025: 20+ sales out early

ZDNET sat down with Andrew Ng at AI Dev 25 in New York to talk about developer futures, responsible AI, and why AGI is overhyped.

You should still learn to code, says top Google AI exec - here's why

Black Friday is just over one week away, but you can get a jump on your holiday shopping list with great deals on gaming desktop PCs, monitors, SSDs, and more.

Best early Black Friday gaming PC deals 2025: My favorite sales out early

Although Black Friday is still two weeks away, you can find great PlayStation deals now from across the internet. I've collected some of the best.

على الرغم من أن يوم الجمعة السوداء لا يزال بعيدًا لمدة أسبوعين، إلا أن هناك بالفعل عروض رائعة على بلاي ستيشن متاحة عبر الإنترنت. يبرز المقال أكثر من 20 عرضًا يمكن العثور عليها عبر منصات مختلفة، مما يوفر فرصة مبكرة للمتسوقين للاستفادة من الخصومات قبل التاريخ الرسمي ليوم الجمعة السوداء.

Aunque el Black Friday aún está a dos semanas de distancia, ya hay excelentes ofertas de PlayStation disponibles en línea. El artículo destaca más de 20 ventas que se pueden encontrar en varias plataformas, brindando una oportunidad anticipada para que los compradores aprovechen los descuentos antes de la fecha oficial del Black Friday.

Bien que le Black Friday soit encore dans deux semaines, il existe déjà d'excellentes offres PlayStation disponibles en ligne. L'article met en avant plus de 20 ventes que l'on peut trouver sur diverses plateformes, offrant ainsi une occasion précoce aux acheteurs de profiter des réductions avant la date officielle du Black Friday.

Although Black Friday is still two weeks away, there are already great PlayStation deals available online. The article highlights over 20 sales that can be found across various platforms, providing an early opportunity for shoppers to take advantage of discounts ahead of the official Black Friday date.

Best early Black Friday PlayStation deals 2025: 20+ sales out now

Black Friday is just over a week away, and you can already find great deals on TVs and home theater equipment from Samsung, Sony, and more.

Best early Black Friday TV deals 2025: Save on Samsung, TCL, and more

This new AI coding environment looks like a real winner. Here's why.

Google's Antigravity puts coding productivity before AI hype - and the result is astonishing

arXiv:2510.24021v2 Announce Type: replace 
Abstract: Knowledge distillation (KD) is a standard route to compress Large Language Models (LLMs) into compact students, yet most pipelines uniformly apply token-wise loss regardless of teacher confidence. This indiscriminate supervision amplifies noisy, high-entropy signals and is especially harmful under large teacher-student capacity gaps. We introduce SelecTKD, a plug-and-play Selective Token-Weighted distillation framework that shifts the focus from "how to measure divergence" to "where to apply learning". At each step, the student proposes tokens that are verified by the teacher through a robust propose-and-verify procedure with two variants: greedy Top-k and non-greedy Spec-k. Accepted tokens receive full loss, while rejected tokens are masked or down-weighted. This objective-agnostic design works with on- and off-policy data, induces an implicit curriculum quantified by Token Acceptance Rate (TAR), and stabilizes optimization. Across instruction following, mathematical reasoning, code generation, and a VLM setting, SelecTKD consistently improves strong baselines and achieves state-of-the-art results for small models without architectural changes or extra reference models.

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

arXiv:2511.13368v1 Announce Type: new 
Abstract: Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

arXiv:2511.11878v1 Announce Type: new 
Abstract: While large language models (LLMs) show transformative potential in healthcare, their development remains focused on high-resource languages, creating a critical barrier for others as simple translation fails to capture unique clinical and cultural nuances, such as endemic diseases. To address this, we introduce MedPT, the first large-scale, real-world corpus for Brazilian Portuguese, comprising 384,095 authentic question-answer pairs from patient-doctor interactions. The dataset underwent a meticulous multi-stage curation protocol, using a hybrid quantitative-qualitative analysis to filter noise and contextually enrich thousands of ambiguous queries. We further augmented the corpus via LLM-driven annotation, classifying questions into seven semantic types to capture user intent. Our analysis reveals its thematic breadth (3,200 topics) and unique linguistic properties, like the natural asymmetry in patient-doctor communication. To validate its utility, we benchmark a medical specialty routing task: fine-tuning a 1.7B parameter model achieves an outstanding 94\% F1-score on a 20-class setup. Furthermore, our qualitative error analysis shows misclassifications are not random but reflect genuine clinical ambiguities (e.g., between comorbid conditions), proving the dataset's deep semantic richness. We publicly release MedPT to foster the development of more equitable, accurate, and culturally-aware medical technologies for the Portuguese-speaking world.

ADI-20: Arabic Dialect Identification dataset and models

Was this article worth reading? Share it