arXiv:2511.03295v2 Announce Type: replace 
Abstract: Automatic evaluation of speech-to-text translation (ST) systems is typically performed by comparing translation hypotheses with one or more reference translations. While effective to some extent, this approach inherits the limitation of reference-based evaluation that ignores valuable information from the source input. In machine translation (MT), recent progress has shown that neural metrics incorporating the source text achieve stronger correlation with human judgments. Extending this idea to ST, however, is not trivial because the source is audio rather than text, and reliable transcripts or alignments between source and references are often unavailable. In this work, we conduct the first systematic study of source-aware metrics for ST, with a particular focus on real-world operating conditions where source transcripts are not available. We explore two complementary strategies for generating textual proxies of the input audio, automatic speech recognition (ASR) transcripts, and back-translations of the reference translation, and introduce a novel two-step cross-lingual re-segmentation algorithm to address the alignment mismatch between synthetic sources and reference translations. Our experiments, carried out on two ST benchmarks covering 79 language pairs and six ST systems with diverse architectures and performance levels, show that ASR transcripts constitute a more reliable synthetic source than back-translations when word error rate is below 20%, while back-translations always represent a computationally cheaper but still effective alternative. Furthermore, our cross-lingual re-segmentation algorithm enables robust use of source-aware MT metrics in ST evaluation, paving the way toward more accurate and principled evaluation methodologies for speech translation.

تم نشر دراسة منهجية حول مقاييس حساسة للمصدر لترجمة الكلام (ST) تركز على تقييم أنظمة ST بدون نصوص موثوقة. تبرز الدراسة أهمية دمج الصوت المصدر في مقاييس التقييم، موضحة أن نصوص التعرف التلقائي على الكلام (ASR) أكثر موثوقية من الترجمات العكسية. هذه الدراسة مهمة لأنها تغطي مرجعين ST عبر 79 زوجًا من اللغات وتختبر ستة أنظمة ST متنوعة، بهدف تحسين دقة تقييمات ترجمة الكلام.

Se publicó un estudio sistemático sobre métricas sensibles a la fuente para la traducción de voz (ST), centrado en la evaluación de sistemas ST sin transcripciones fiables. La investigación destaca la importancia de incorporar el audio fuente en las métricas de evaluación, mostrando que las transcripciones de reconocimiento automático de voz (ASR) son más fiables que las retrotraducciones. Este trabajo es significativo ya que cubre dos referencias ST en 79 pares de idiomas y prueba seis sistemas ST diversos, con el objetivo de mejorar la precisión de las evaluaciones de traducción de voz.

Une étude systématique sur les métriques sensibles à la source pour la traduction de la parole (ST) a été publiée, se concentrant sur l'évaluation des systèmes ST sans transcriptions fiables. La recherche souligne l'importance d'incorporer l'audio source dans les métriques d'évaluation, montrant que les transcriptions de reconnaissance automatique de la parole (ASR) sont plus fiables que les rétro-traductions. Ce travail est significatif car il couvre deux références ST sur 79 paires de langues et teste six systèmes ST divers, visant à améliorer la précision des évaluations de traduction de la parole.

A systematic study on source-aware metrics for speech translation (ST) was published, focusing on evaluating ST systems without reliable transcripts. The research highlights the importance of incorporating source audio into evaluation metrics, showing that automatic speech recognition (ASR) transcripts are more reliable than back-translations. This work is significant as it covers two ST benchmarks across 79 language pairs and tests six diverse ST systems, aiming to improve the accuracy of speech translation evaluations.

How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

حقق Google Gemini 3 إنجازًا كبيرًا من خلال تجاوز جميع معايير الذكاء الاصطناعي الحالية، بما في ذلك الأكثر تحديًا. يبرز هذا الإنجاز التقدم الذي حققته فريق الذكاء الاصطناعي في Google ويضع Gemini 3 كمتنافس رائد في مجال الذكاء الاصطناعي. يتم الاحتفال بهذا النجاح داخل مجتمع التكنولوجيا، مما يعكس التطور المستمر لتقنيات الذكاء الاصطناعي وقدراتها.

Google Gemini 3 ha logrado un hito significativo al superar todos los benchmarks de IA existentes, incluidos los más desafiantes. Este logro resalta los avances realizados por el equipo de IA de Google y posiciona a Gemini 3 como un competidor líder en el panorama de la inteligencia artificial. El éxito es celebrado dentro de la comunidad tecnológica, reflejando la evolución continua de las tecnologías de IA y sus capacidades.

Google Gemini 3 a atteint un jalon significatif en surpassant tous les benchmarks d'IA existants, y compris les plus difficiles. Cet accomplissement met en évidence les avancées réalisées par l'équipe d'IA de Google et positionne Gemini 3 comme un concurrent de premier plan dans le paysage de l'intelligence artificielle. Le succès est célébré au sein de la communauté technologique, reflétant l'évolution continue des technologies d'IA et de leurs capacités.

Google Gemini 3 has achieved a significant milestone by surpassing all existing AI benchmarks, including the most challenging ones. This accomplishment highlights the advancements made by Google's AI team and positions Gemini 3 as a leading contender in the artificial intelligence landscape. The success is celebrated within the tech community, reflecting the ongoing evolution of AI technologies and their capabilities.

Google Gemini 3 Just Killed Every AI Benchmark, Including the Hardest of All

Two weeks ago I read a line about tool use with Claude that stuck in my head. Paraphrased:

<blockquote>
Direct tool calls don’t really scale. 
Have the model write code that uses tools, and execute that code instead.
</blockquote>

At the same time, I was knee-deep in wiring a JavaScript execution environment into Contenox, my self-hosted runtime for deterministic, chat-native AI workflows.

So of course the thought was:

<blockquote>
What if I just let the model write the JavaScript and run it inside the runtime? 😅
</blockquote>

This post is about what happened when I tried exactly that.




<h2>
 
 
 What is Contenox?
</h2>

Very short version:

<blockquote>
Contenox is a self-hostable runtime for sovereign GenAI applications. 
It models AI behavior as explicit state machines, not opaque prompt chains.
</blockquote>

Some key properties:

Runtime, not a library, Explicit state machines, Chat-native interface, Vendor-agnostic &amp; self-hosted, written in go with lots of passion and zero tolerance for shortcuts




<h2>
 
 
 The experiment: ask it to fetch and summarize a TODO
</h2>

Once the JS execution hook was in place (a Goja VM with some globals exposed), I wired up a new state machine: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>mux_input → moderate → generate_js → run_js → eval_js_result → (repair_js?) → answer
</code></pre>

</div>



Then I opened the Contenox chat UI and typed:

<blockquote>
Me: “Hey Contenox, fetch the TODO item from jsonplaceholder and summarize it.”
</blockquote>

The first attempt didn’t go as planned.

Instead of a neat summary, I got:

<blockquote>
Contenox: 
“I'm sorry, but it seems there was a problem fetching the TODO item from the provided URL due to an Internal Server Error (status code: 500). Unfortunately, I couldn't retrieve any data at this time. My apologies for the inconvenience!”
</blockquote>

I checked manually – Cloudflare was having a moment, JSONPlaceholder was behind it, and the request really was failing. Okay, fair. Long coffee break.

A bit later, I went back to the same chat and typed:

<blockquote>
Me: “okay let's try again now.”
</blockquote>

This time the reply was:

<blockquote>
Contenox: 
<code>Task titled "delectus aut autem" is incomplete. Nothing to summarize here.</code>
</blockquote>

Wait. It worked? Just like that?




<h2>
 
 
 Under the hood: LLM → JavaScript → tools → LLM
</h2>

Looking at the execution trace made it clear what happened.

<h3>
 
 
 1. Input is normalized and moderated
</h3>

The workflow starts with:

<ol>
<li><code>mux_input</code></li>
</ol>

<ul>
<li>
Parses the incoming chat history into a normalized <code>messages</code> array.

<ol>
<li><code>moderate</code></li>
</ol>
</li>
<li>Uses a small model to classify the input as safe/unsafe.</li>
<li>Output: <code>0</code> → safe → continue.</li>
</ul>

So far this is standard workflow stuff.

<h3>
 
 
 2. <code>generate_js</code>: the LLM writes JavaScript
</h3>

Next, the <code>generate_js</code> task runs with a system instruction like:

<ul>
<li>“You are an expert JavaScript agent for the Contenox runtime…”</li>
<li>“You must output JSON: { "code": "" }”</li>
<li>“You can use httpFetch, executeTask, executeTaskChain, …”</li>
<li>“Script must be synchronous, no async/await, and must return a JSON-serializable object.”</li>
</ul>

Given the chat history, the model produced something like this (simplified): 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "code": "var result;
var answer;
var meta = { steps: [] };

var todoUrl = \"https://jsonplaceholder.typicode.com/todos/1\";

// Step 1: Fetch the TODO item
meta.steps.push(\"Fetching TODO item from \" + todoUrl);
var fetchResponse = httpFetch(todoUrl);

if (!fetchResponse.ok) {
 answer = \"I couldn't access that item right now because there was some trouble with getting it. Please check the URL or try again later.\";
 meta.error = {
 message: \"HTTP fetch failed\",
 url: todoUrl,
 status: fetchResponse.status,
 statusText: fetchResponse.statusText,
 body: fetchResponse.body,
 error: fetchResponse.error
 };
 result = { answer: answer, meta: meta };
} else {
 var todoItem;
 try {
 todoItem = JSON.parse(fetchResponse.body);
 meta.raw = todoItem;
 } catch (e) {
 answer = \"I fetched the item, but I couldn't understand its format. It doesn't seem to be valid JSON.\";
 meta.error = {
 message: \"JSON parse error\",
 rawBody: fetchResponse.body,
 parseError: e.toString()
 };
 result = { answer: answer, meta: meta };
 }

 if (todoItem) {
 // Step 2: Summarize the TODO item using an LLM
 meta.steps.push(\"Summarizing TODO item with LLM\");
 var prompt = \"Summarize the following TODO item in a concise sentence: \" + JSON.stringify(todoItem);
 var summarizeTask = executeTask(prompt, \"gpt-4o-mini\", \"openai\");

 if (!summarizeTask.success) {
 answer = \"While I managed to fetch your TODO details, the summary tool had a hiccup and couldn't provide me with a concise sentence right now. We seem to be experiencing some temporary issues.\";
 meta.error = {
 message: \"LLM task execution failed\",
 task_id: summarizeTask.task_id,
 error: summarizeTask.error
 };
 result = { answer: answer, meta: meta };
 } else {
 answer = summarizeTask.response;
 meta.llmResponse = summarizeTask.response;
 result = { answer: answer, meta: meta };
 }
 }
}

result;"
}
</code></pre>

</div>



A few important details here:

<ul>
<li>
It encodes error handling:

<ul>
<li>If HTTP fails → user-friendly <code>answer</code>, raw error in <code>meta.error</code>.</li>
<li>If JSON parse fails → similar pattern.</li>
<li>If the inner LLM call fails → fallback message.</li>
</ul>


</li>

<li>It delegates the actual summarization to another model via <code>executeTask</code>.</li>

<li>It returns a structured <code>result</code> with both <code>answer</code> and <code>meta</code>.</li>

</ul>

This is not the model “calling tools” directly. It’s the model writing a program that calls tools.

<h3>
 
 
 3. <code>run_js</code>: execute the code in a sandbox
</h3>

The next task is <code>run_js</code>, which is just a Contenox <code>hook</code> that calls the JS sandbox: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "name": "js_sandbox",
 "tool_name": "execute_js",
 "args": {
 "code": "{{.generate_js.code}}"
 }
}
</code></pre>

</div>



Inside the trace you can see:

<ul>
<li>An <code>httpFetch</code> log for the JSONPlaceholder URL.</li>
<li>A response with <code>status: 200 OK</code> when things finally worked.</li>
<li>
An <code>executeTask</code> log with the summarization prompt:

<ul>
<li><code>Summarize the following TODO item in a concise sentence: {"userId":1,"id":1,"title":"delectus aut autem","completed":false}</code></li>
</ul>


</li>

</ul>

The sandbox result looked roughly like: 


<div class="highlight js-code-highlight">
<pre class="highlight json"><code>{
 "ok": true,
 "result": {
 "answer": "Task titled \"delectus aut autem\" is incomplete.",
 "meta": {
 "llmResponse": "Task titled \"delectus aut autem\" is incomplete.",
 "raw": {
 "userId": 1,
 "id": 1,
 "title": "delectus aut autem",
 "completed": false
 },
 "steps": [
 "Fetching TODO item from https://jsonplaceholder.typicode.com/todos/1",
 "Summarizing TODO item with LLM"
 ]
 }
 },
 "logs": [ ... ],
 "code": "var result; ..."
}
</code></pre>

</div>



<h3>
 
 
 4. <code>eval_js_result</code>: success or retry?
</h3>

Now comes the evaluator:

<ul>
<li>It receives a description of the JS sandbox output.</li>
<li>
The system prompt is very strict:

<ul>
<li>If <code>ok</code> is true and there is a non-empty <code>result.answer</code> → respond with <code>success</code>.</li>
<li>Otherwise → respond with <code>retry</code>.</li>
</ul>


</li>

</ul>

On the successful run, it answered: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>success
</code></pre>

</div>



So the workflow does not go into <code>repair_js</code> or <code>run_js_retry</code>. Happy path.

<h3>
 
 
 5. <code>answer</code>: extract the final user message
</h3>

The final task, <code>answer</code>, is intentionally boring:

<ul>
<li>System prompt: “You are a purely extractive post-processor. Do NOT invent content. Just surface the best existing <code>answer</code> field.”
</li>
<li>
It gets:

<ul>
<li>First run (<code>run_js</code> result).</li>
<li>Second run (<code>run_js_retry</code>), if any.</li>
</ul>


</li>

<li>

Selection rule:

<ul>
<li>Take the last non-empty <code>answer</code> you see.</li>
<li>Output it verbatim.</li>
</ul>


</li>

</ul>

In our case it found: 


<div class="highlight js-code-highlight">
<pre class="highlight plaintext"><code>Task titled "delectus aut autem" is incomplete.
</code></pre>

</div>



And that’s exactly what Contenox replied in chat.




<h2>
 
 
 Why this is interesting (to me, at least)
</h2>

What I originally set out to build:

<blockquote>
A runtime for deterministic, observable GenAI workflows. 
Tasks, transitions, hooks – all explicit and replayable.
</blockquote>

What I accidentally stumbled into:

<blockquote>
A multi-model, self-orchestrating agent pattern, 
where LLMs write code that uses tools, and the runtime executes and evaluates that code.
</blockquote>

The pattern looks like this:

<ol>
<li>
Planner LLM (<code>generate_js</code>)</li>
</ol>

<ul>
<li>Reads user intent + history.</li>
<li>Emits JavaScript that calls <code>httpFetch</code>, <code>executeTask</code>, <code>executeTaskChain</code>, hooks, etc.</li>
</ul>

<ol>
<li>
Execution environment (<code>run_js</code> in Goja)</li>
</ol>

<ul>
<li>Deterministic execution of that JS.</li>
<li>Full logs of every HTTP call, every inner LLM call, every step.</li>
</ul>

<ol>
<li>
Controller LLM (<code>eval_js_result</code>)</li>
</ol>

<ul>
<li>Looks at the sandbox result.</li>
<li>Decides: is this good enough? Retry? Repair?</li>
</ul>

<ol>
<li>
Repair LLM (<code>repair_js</code>, if needed)</li>
</ol>

<ul>
<li>Gets the previous code + error output.</li>
<li>Writes a fixed version of the JS.</li>
</ul>

<ol>
<li>
Answer LLM (<code>answer</code>)</li>
</ol>

<ul>
<li>Doesn’t “reason” at all.</li>
<li>Just extracts the final <code>answer</code> text safely.</li>
</ul>

All of that is expressed as an explicit state machine in Contenox.

No hidden loops, no undocumented retries, no magic glue code inside some SDK. It’s all visible in the workflow graph and trace.




To me, that’s the exciting part:

<blockquote>
You don’t have to choose between “boring deterministic workflows” and “fancy agents”. 
You can build the agent on top of deterministic workflows. 
And everything stays **self-hosted, inspectable, and auditable if you want.
</blockquote>

يتناول المقال تجربة تم فيها السماح لنموذج ذكاء اصطناعي بكتابة كود جافا سكريبت داخل بيئة مستقلة تُدعى كونتينوكس. يتأمل الكاتب في مفهوم يتعلق باستخدام الأدوات في الذكاء الاصطناعي، مقترحًا أن النماذج يجب أن تولد كودًا لاستخدام الأدوات بدلاً من إجراء مكالمات مباشرة. تم اختبار هذا النهج من خلال تنفيذ كود جافا سكريبت الذي تم إنشاؤه داخل بيئة كونتينوكس، بهدف تحسين كفاءة سير العمل في الذكاء الاصطناعي.

El artículo discute un experimento en el que se permitió a un modelo de IA escribir código JavaScript dentro de un entorno autónomo llamado Contenox. El autor reflexiona sobre un concepto relacionado con el uso de herramientas en IA, sugiriendo que los modelos deberían generar código para utilizar herramientas en lugar de realizar llamadas directas. Este enfoque se probó ejecutando el JavaScript generado dentro del entorno Contenox, con el objetivo de mejorar la eficiencia de los flujos de trabajo de IA.

L'article traite d'une expérience où un modèle d'IA a été autorisé à écrire du code JavaScript au sein d'un environnement autonome appelé Contenox. L'auteur réfléchit à un concept concernant l'utilisation des outils en IA, suggérant que les modèles devraient générer du code pour utiliser des outils plutôt que d'effectuer des appels directs. Cette approche a été testée en exécutant le JavaScript généré dans l'environnement Contenox, visant à améliorer l'efficacité des flux de travail en IA.

The article discusses an experiment where an AI model was allowed to write JavaScript code within a self-hosted runtime called Contenox. The author reflects on a concept regarding tool usage in AI, suggesting that models should generate code to utilize tools instead of direct calls. This approach was tested by executing the generated JavaScript within the Contenox environment, aiming to enhance the efficiency of AI workflows.

I Let an LLM Write JavaScript Inside My AI Runtime. Here’s What Happened

Caterpillar Inc. was always an unlikely winner in the artificial intelligence craze. It makes the bulk of its money selling the equipment like yellow earth movers that has made it a stalwart of American industry.

تظهر شركة كاتربيلر كجهة غير متوقعة في قطاع الذكاء الاصطناعي، حيث تُعرف أساسًا بتصنيع الآلات الثقيلة مثل آلات الحفر. تاريخيًا، كانت الشركة تركز على المعدات الصناعية التقليدية، مما يجعلها أقل توافقًا مع الاتجاهات التكنولوجية المدفوعة بالذكاء الاصطناعي التي اجتذبت العديد من القطاعات الأخرى. على الرغم من الاهتمام المتزايد بالذكاء الاصطناعي، لا يزال النشاط الرئيسي لشركة كاتربيلر متجذرًا في الآلات المادية، مما قد يحد من جاذبيتها في مشهد تكنولوجي سريع التطور.

Caterpillar Inc. se ha presentado como un jugador improbable en el sector de la inteligencia artificial, siendo principalmente conocida por la fabricación de maquinaria pesada como las excavadoras. La compañía ha estado históricamente enfocada en equipos industriales tradicionales, lo que la hace menos alineada con las tendencias tecnológicas impulsadas por la IA que han cautivado a muchos otros sectores. A pesar del creciente interés en la IA, el negocio principal de Caterpillar sigue anclado en la maquinaria física, lo que podría limitar su atractivo en un paisaje tecnológico en rápida evolu…

Caterpillar Inc. se présente comme un acteur improbable dans le secteur de l'intelligence artificielle, étant principalement connu pour sa fabrication de machines lourdes telles que les pelles mécaniques. L'entreprise s'est historiquement concentrée sur l'équipement industriel traditionnel, ce qui la rend moins alignée avec les tendances technologiques axées sur l'IA qui ont captivé de nombreux autres secteurs. Malgré l'intérêt croissant pour l'IA, le cœur de métier de Caterpillar reste ancré dans la machinerie physique, ce qui pourrait limiter son attrait dans un paysage technologique en évol…

Caterpillar Inc. has emerged as an unlikely player in the artificial intelligence sector, primarily known for its manufacturing of heavy machinery such as earth movers. The company has historically focused on traditional industrial equipment, making it less aligned with the AI-driven technology trends that have captivated many other sectors. Despite the growing interest in AI, Caterpillar's core business remains rooted in physical machinery, which may limit its appeal in the rapidly evolving tech landscape.

Caterpillar’s Lone Bear Says Machinery Maker Is No AI Darling

<a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html" target="_blank"><img src="https://www.techspot.com/images2/news/ts3_thumbs/2025/11/2025-11-18-ts3_thumbs-d01.jpg" width="800" height="560" style="padding: 15px 0" title="Microsoft explains how Windows 11 will become an agentic OS whether you like it or not" /></a> Windows president Pavan Davuluri recently described the future of Windows as an agentic operating system, where AI bots and large language models handle the user's commands on files and computing tasks. Critics mostly greeted the idea with scorn, cursing, and frustration over the "bug-ridden slop pile" the OS currently is.... <a href="https://www.techspot.com/news/110306-microsoft-explains-how-windows-11-become-agentic-os.html">Read Entire Article</a>

وصف رئيس شركة مايكروسوفت، بavan دافولوري، خطط ويندوز 11 للتطور إلى ما يسميه 'نظام تشغيل وكيل'. ستتضمن هذه التحويلة دمج روبوتات الذكاء الاصطناعي ونماذج اللغة الكبيرة لإدارة أوامر المستخدم ومهام الحوسبة. ومع ذلك، تم استقبال هذا الإعلان بسخرية وانتقادات من المستخدمين الذين يشعرون بالإحباط من الحالة الحالية لنظام التشغيل، الذي يصفونه بأنه مليء بالأخطاء.

El presidente de Microsoft, Pavan Davuluri, ha descrito los planes para que Windows 11 evolucione hacia lo que él llama un 'sistema operativo agente'. Esta transformación implicará la integración de bots de IA y modelos de lenguaje para gestionar los comandos y tareas informáticas del usuario. Sin embargo, el anuncio ha sido recibido con escepticismo y críticas por parte de los usuarios, que están frustrados con el estado actual del sistema operativo, que describen como plagado de errores.

Le président de Microsoft, Pavan Davuluri, a décrit les projets de Windows 11 pour évoluer vers ce qu'il appelle un 'système d'exploitation agentique'. Cette transformation impliquera l'intégration de bots IA et de modèles de langage pour gérer les commandes des utilisateurs et les tâches informatiques. Cependant, cette annonce a été accueillie avec scepticisme et critiques de la part des utilisateurs frustrés par l'état actuel du système d'exploitation, qu'ils décrivent comme rempli de bogues.

Microsoft's president, Pavan Davuluri, has outlined plans for Windows 11 to evolve into what he describes as an 'agentic operating system.' This transformation will involve the integration of AI bots and large language models to manage user commands and computing tasks. However, the announcement has been met with skepticism and criticism from users who are frustrated with the current state of the operating system, which they describe as plagued by bugs.

Microsoft explains how Windows 11 will become an agentic OS whether you like it or not

Although Black Friday is still two weeks away, you can find great Nintendo Switch and Switch 2 deals now. I've collected the best from Walmart, Best Buy, and more.

مع اقتراب يوم الجمعة السوداء بعد أسبوعين، تتوفر بالفعل عروض مبكرة على أجهزة نينتندو سويتش وسويتش 2. تقدم متاجر كبيرة مثل وول مارت وبيست باي أكثر من 20 عرضًا، مما يوفر للمستهلكين فرصة لتوفير المال على منتجات الألعاب الشهيرة قبل موسم التسوق للعطلات.

A medida que se acerca el Black Friday en dos semanas, ya están disponibles ofertas anticipadas en las consolas Nintendo Switch y Switch 2. Grandes minoristas como Walmart y Best Buy están ofreciendo más de 20 ventas, brindando a los consumidores la oportunidad de ahorrar en productos de videojuegos populares antes de la locura de compras navideñas.

À l'approche de Black Friday dans deux semaines, des offres anticipées sur les consoles Nintendo Switch et Switch 2 sont déjà disponibles. Des détaillants majeurs comme Walmart et Best Buy proposent plus de 20 ventes, offrant aux consommateurs l'occasion d'économiser sur des produits de jeu populaires avant la ruée des achats de vacances.

As Black Friday approaches in two weeks, early deals on Nintendo Switch and Switch 2 consoles are already available. Major retailers like Walmart and Best Buy are offering over 20 sales, providing consumers with an opportunity to save on popular gaming products ahead of the holiday shopping rush.

Best early Black Friday Nintendo Switch deals 2025: 20+ sales out early

<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251118/i47.jpg"></A>
<A HREF="http://www.techmeme.com/251118/p47#a251118p47" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Alex Heath / <A HREF="https://sources.news/">Sources</A>: 
<A HREF="https://substack.com/redirect/2/eyJlIjoiaHR0cHM6Ly9zb3VyY2VzLm5ld3MvcC9kZW1pcy1oYXNzaWJhcy1vbi1nZW1pbmktMy13b3JsZD91dG1fY2FtcGFpZ249ZW1haWwtcG9zdCZyPTFyODVmJnRva2VuPWV5SjFjMlZ5WDJsa0lqb3lPVFE1T0RreExDSndiM04wWDJsa0lqb3hOemt5TlRnNE9UZ3NJbWxoZENJNk1UYzJNelE1TmpVME5pd2laWGh3SWpveE56WTJNRGc0TlRRMkxDSnBjM01pT2lKd2RXSXRNelV5TlRjNE1DSXNJbk4xWWlJNkluQnZjM1F0Y21WaFkzUnBiMjRpZlEucDdlcWFuMFM3WDNXTXQ1OUY4Y2RYZG1tb1VBRGJNMlBGZDM1c3ZZWUc2YyIsInAiOjE3OTI1ODg5OCwicyI6MzUyNTc4MCwiZiI6ZmFsc2UsInUiOjI5NDk4OTEsImlhdCI6MTc2MzQ5NjU0NiwiZXhwIjoyMDc5MDcyNTQ2LCJpc3MiOiJwdWItMCIsInN1YiI6ImxpbmstcmVkaXJlY3QifQ.MstA6dhKs3CLxJgLSEpyVK_D4Oz9SmcjeXKNnqEDzIg?">Q&amp;A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more</A>&nbsp; &mdash;&nbsp; Demis Hassabis was noticeably relaxed when he joined our virtual call from London.&nbsp; &mdash;&nbsp; It was the day before the release of Gemini 3 &hellip;

ناقش ديميس هاسابيس، المؤسس المشارك لشركة ديب مايند، تقدم نموذج جيميني 3، أحدث نموذج للذكاء الاصطناعي من جوجل، مشددًا على قدراته في نمذجة العالم ودمج كامل فهرس بحث جوجل في النظام. تناول المخاوف بشأن فقاعة الذكاء الاصطناعي وأبرز إمكانيات النموذج في تحسين تفاعلات المستخدمين وتوفير معلومات أكثر دقة. تعكس أفكار هاسابيس التزامًا بدفع حدود تكنولوجيا الذكاء الاصطناعي ودمجها في التطبيقات اليومية.

Demis Hassabis, cofundador de DeepMind, discutió los avances de Gemini 3, el último modelo de IA de Google, enfatizando sus capacidades en modelos del mundo y la integración de todo el índice de búsqueda de Google en el sistema. Abordó las preocupaciones sobre la burbuja de la IA y destacó el potencial del modelo para mejorar las interacciones con los usuarios y proporcionar información más precisa. Las ideas de Hassabis reflejan un compromiso por llevar la tecnología de IA a nuevos límites y su integración en aplicaciones cotidianas.

Demis Hassabis, co-fondateur de DeepMind, a discuté des avancées de Gemini 3, le dernier modèle d'IA de Google, en mettant l'accent sur ses capacités en modélisation du monde et l'intégration de l'ensemble de l'index de recherche de Google dans le système. Il a abordé les préoccupations concernant la bulle de l'IA et a souligné le potentiel du modèle à améliorer les interactions avec les utilisateurs et à fournir des informations plus précises. Les réflexions de Hassabis reflètent un engagement à repousser les limites de la technologie IA et son intégration dans les applications quotidiennes.

Demis Hassabis, co-founder of DeepMind, discussed the advancements of Gemini 3, Google's latest AI model, emphasizing its capabilities in world modeling and fitting the entire Google Search index into the system. He addressed concerns about the AI bubble and highlighted the model's potential to enhance user interactions and provide more accurate information. Hassabis's insights reflect a commitment to pushing the boundaries of AI technology and its integration into everyday applications.

Q&A with Demis Hassabis on Gemini 3, spending most of his research time on world models, fitting the entire Google Search index into Gemini, AI bubble, and more (Alex Heath/Sources)

arXiv:2511.10670v1 Announce Type: new 
Abstract: Code-switching (CS) speech translation (ST) refers to translating speech that alternates between two or more languages into a target language text, which poses significant challenges due to the complexity of semantic modeling and the scarcity of CS data. Previous studies tend to rely on the model itself to implicitly learn semantic modeling during training, and resort to inefficient and costly manual annotations for these two challenges. To mitigate these limitations, we propose enhancing Large Language Models (LLMs) with a Mixture of Experts (MoE) speech projector, where each expert specializes in the semantic subspace of a specific language, enabling fine-grained modeling of speech features. Additionally, we introduce a multi-stage training paradigm that utilizes readily available monolingual automatic speech recognition (ASR) and monolingual ST data, facilitating speech-text alignment and improving translation capabilities. During training, we leverage a combination of language-specific loss and intra-group load balancing loss to guide the MoE speech projector in efficiently allocating tokens to the appropriate experts, across expert groups and within each group, respectively. To bridge the data gap across different training stages and improve adaptation to the CS scenario, we further employ a transition loss, enabling smooth transitions of data between stages, to effectively address the scarcity of high-quality CS speech translation data. Extensive experiments on widely used datasets demonstrate the effectiveness and generality of our approach.

يتناول المقال التقدم في ترجمة الكلام بالتبديل اللغوي، والذي يتضمن ترجمة الكلام الذي يتناوب بين لغتين أو أكثر. يبرز المؤلفون التحديات التي تطرحها تعقيدات النمذجة الدلالية وندرة بيانات التبديل اللغوي. لمعالجة هذه المشكلات، يقترحون تعزيز نماذج اللغة الكبيرة باستخدام جهاز عرض صوتي مختلط من الخبراء، وتقديم نموذج تدريب متعدد المراحل يستخدم بيانات التعرف التلقائي على الكلام والترجمة الصوتية الأحادية اللغة المتاحة.

El artículo aborda los avances en la traducción de habla en cambio de código, que implica traducir el habla que alterna entre varios idiomas. Los autores destacan los desafíos que plantea la complejidad de la modelización semántica y la escasez de datos de cambio de código. Para abordar estos problemas, proponen mejorar los Modelos de Lenguaje Grande con un proyector de habla de Mezcla de Expertos y presentar un paradigma de entrenamiento en múltiples etapas que utilice datos existentes de reconocimiento automático de habla y traducción de habla monolingües.

L'article traite des avancées dans la traduction de la parole en code-switching, qui consiste à traduire des discours alternant entre plusieurs langues. Les auteurs soulignent les défis posés par la complexité de la modélisation sémantique et le manque de données sur le code-switching. Pour remédier à ces problèmes, ils proposent d'améliorer les grands modèles de langage avec un projecteur de parole basé sur un mélange d'experts et d'introduire un paradigme d'entraînement en plusieurs étapes utilisant des données de reconnaissance automatique de la parole et de traduction de la parole monoling…

The article discusses advancements in code-switching speech translation, which involves translating speech that alternates between multiple languages. The authors highlight the challenges posed by semantic modeling complexities and the lack of code-switching data. To address these issues, they propose enhancing Large Language Models with a Mixture of Experts speech projector and introducing a multi-stage training paradigm that utilizes existing monolingual automatic speech recognition and speech translation data.

Towards Fine-Grained Code-Switch Speech Translation with Semantic Space Alignment

arXiv:2510.24021v2 Announce Type: replace 
Abstract: Knowledge distillation (KD) is a standard route to compress Large Language Models (LLMs) into compact students, yet most pipelines uniformly apply token-wise loss regardless of teacher confidence. This indiscriminate supervision amplifies noisy, high-entropy signals and is especially harmful under large teacher-student capacity gaps. We introduce SelecTKD, a plug-and-play Selective Token-Weighted distillation framework that shifts the focus from "how to measure divergence" to "where to apply learning". At each step, the student proposes tokens that are verified by the teacher through a robust propose-and-verify procedure with two variants: greedy Top-k and non-greedy Spec-k. Accepted tokens receive full loss, while rejected tokens are masked or down-weighted. This objective-agnostic design works with on- and off-policy data, induces an implicit curriculum quantified by Token Acceptance Rate (TAR), and stabilizes optimization. Across instruction following, mathematical reasoning, code generation, and a VLM setting, SelecTKD consistently improves strong baselines and achieves state-of-the-art results for small models without architectural changes or extra reference models.

SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs

arXiv:2511.13368v1 Announce Type: new 
Abstract: Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.

Donors and Recipients: On Asymmetric Transfer Across Tasks and Languages with Parameter-Efficient Fine-Tuning

arXiv:2511.11878v1 Announce Type: new 
Abstract: While large language models (LLMs) show transformative potential in healthcare, their development remains focused on high-resource languages, creating a critical barrier for others as simple translation fails to capture unique clinical and cultural nuances, such as endemic diseases. To address this, we introduce MedPT, the first large-scale, real-world corpus for Brazilian Portuguese, comprising 384,095 authentic question-answer pairs from patient-doctor interactions. The dataset underwent a meticulous multi-stage curation protocol, using a hybrid quantitative-qualitative analysis to filter noise and contextually enrich thousands of ambiguous queries. We further augmented the corpus via LLM-driven annotation, classifying questions into seven semantic types to capture user intent. Our analysis reveals its thematic breadth (3,200 topics) and unique linguistic properties, like the natural asymmetry in patient-doctor communication. To validate its utility, we benchmark a medical specialty routing task: fine-tuning a 1.7B parameter model achieves an outstanding 94\% F1-score on a 20-class setup. Furthermore, our qualitative error analysis shows misclassifications are not random but reflect genuine clinical ambiguities (e.g., between comorbid conditions), proving the dataset's deep semantic richness. We publicly release MedPT to foster the development of more equitable, accurate, and culturally-aware medical technologies for the Portuguese-speaking world.

How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

Was this article worth reading? Share it