arXiv:2410.02787v2 Announce Type: replace-cross 
Abstract: Navigating towards fully open language goals and exploring open scenes in an intelligent way have always raised significant challenges. Recently, Vision Language Models (VLMs) have demonstrated remarkable capabilities to reason with both language and visual data. Although many works have focused on leveraging VLMs for navigation in open scenes, they often require high computational cost, rely on object-centric approaches, or depend on environmental priors in detailed human instructions. We introduce Navigation with VLM (NavVLM), a training-free framework that harnesses open-source VLMs to enable robots to navigate effectively, even for human-friendly language goal such as abstract places, actions, or specific objects in open scenes. NavVLM leverages the VLM as its cognitive core to perceive environmental information and constantly provides exploration guidance achieving intelligent navigation with only a neat target rather than a detailed instruction with environment prior. We evaluated and validated NavVLM in both simulation and real-world experiments. In simulation, our framework achieves state-of-the-art performance in Success weighted by Path Length (SPL) on object-specifc tasks in richly detailed environments from Matterport 3D (MP3D), Habitat Matterport 3D (HM3D) and Gibson. With navigation episode reported, NavVLM demonstrates the capabilities to navigate towards any open-set languages. In real-world validation, we validated our framework's effectiveness in real-world robot at indoor scene.

تفتح التطورات الأخيرة في نماذج اللغة البصرية (VLM) الطريق أمام تنقل أكثر كفاءة في المشاهد المفتوحة، مما يعالج التحديات المستمرة في هذا المجال. يمكن لهذه النماذج التفكير بذكاء باستخدام كل من البيانات اللغوية والمرئية، مما يجعلها أداة واعدة لتحقيق أهداف لغوية مفتوحة بالكامل. هذا التطور مهم لأنه قد يؤدي إلى أنظمة تنقل أكثر سهولة ومرونة، مما يعزز تجارب المستخدمين عبر تطبيقات متنوعة.

Los avances recientes en los Modelos de Lenguaje Visual (VLM) están allanando el camino para una navegación más eficiente en escenas abiertas, abordando desafíos de larga data en el campo. Estos modelos pueden razonar inteligentemente con datos tanto lingüísticos como visuales, lo que los convierte en una herramienta prometedora para lograr objetivos lingüísticos completamente abiertos. Este desarrollo es significativo, ya que podría llevar a sistemas de navegación más accesibles y versátiles, mejorando la experiencia del usuario en diversas aplicaciones.

Les avancées récentes dans les modèles de langage visuel (VLM) ouvrent la voie à une navigation plus efficace dans des scènes ouvertes, répondant à des défis de longue date dans le domaine. Ces modèles peuvent raisonner intelligemment avec des données linguistiques et visuelles, ce qui en fait un outil prometteur pour atteindre des objectifs linguistiques totalement ouverts. Ce développement est important car il pourrait conduire à des systèmes de navigation plus accessibles et polyvalents, améliorant l'expérience utilisateur dans diverses applications.

Recent advancements in Vision Language Models (VLMs) are paving the way for more efficient navigation in open scenes, addressing long-standing challenges in the field. These models can intelligently reason with both language and visual data, making them a promising tool for achieving fully open language goals. This development is significant as it could lead to more accessible and versatile navigation systems, enhancing user experiences across various applications.

Navigation with VLM framework: Towards Going to Any Language

Medtech founder and educator Rush Bartlett talks about medtech innovation on both sides of the pond.
Read more: <a rel="nofollow" href="https://www.siliconrepublic.com/innovation/stanford-biodesign-medtech-innovation-bioinnovate-ireland-galway-startups-healthcare">Ireland is top for medtech innovation, says Stanford expert</a>

يبرز راش بارتليت، مؤسس ومربي في مجال التكنولوجيا الطبية، مكانة أيرلندا الرائدة في الابتكار في هذا المجال، مشيرًا إلى التقدم الذي يحدث في غالواي. تعكس رؤاه اعترافًا متزايدًا بأيرلندا كمركز لتطوير التكنولوجيا الصحية.

Rush Bartlett, fundador y educador en medtech, destaca la posición de Irlanda como líder en innovación medtech, enfatizando los avances que se están realizando en Galway. Sus comentarios reflejan un reconocimiento creciente de Irlanda como un centro de desarrollo tecnológico en el sector de la salud.

Rush Bartlett, fondateur et éducateur en medtech, souligne la position de leader de l'Irlande en matière d'innovation medtech, mettant en avant les avancées réalisées à Galway. Ses réflexions témoignent d'une reconnaissance croissante de l'Irlande comme un pôle de développement technologique dans le secteur de la santé.

Rush Bartlett, a medtech founder and educator, highlights Ireland's leading position in medtech innovation, emphasizing the advancements occurring in Galway. His insights reflect a growing recognition of Ireland as a hub for healthcare technology development.

Ireland is top for medtech innovation, says Stanford expert

It’s not the time to bet against the biggest US technology names, according to short-seller Carson Block, even as warnings rise about a potential bubble in artificial intelligence.

قال كارسن بلوك، مؤسس شركة مادّي ووترز، إنه ليس الوقت المناسب للمراهنة ضد أكبر شركات التكنولوجيا الأمريكية، على الرغم من تزايد التحذيرات بشأن احتمال وجود فقاعة في الذكاء الاصطناعي. ويعتقد أن الظروف الحالية في السوق لا تدعم المراهنة ضد هؤلاء العمالقة التكنولوجيين.

Carson Block, fundador de Muddy Waters, ha desaconsejado apostar en contra de las principales empresas tecnológicas de EE. UU., a pesar de las crecientes preocupaciones sobre una posible burbuja en el sector de la inteligencia artificial. Considera que las condiciones actuales del mercado no favorecen las apuestas contra estos gigantes tecnológicos.

Carson Block, le fondateur de Muddy Waters, a déconseillé de parier contre les grandes entreprises technologiques américaines, malgré les préoccupations croissantes concernant une éventuelle bulle dans le secteur de l'intelligence artificielle. Il estime que les conditions actuelles du marché ne favorisent pas les paris contre ces géants de la technologie.

Carson Block, the founder of Muddy Waters, has advised against shorting major US technology companies, despite increasing concerns about a potential bubble in the artificial intelligence sector. He believes that the current market conditions do not favor betting against these tech giants.

Muddy Waters’ Carson Block Says It’s Not the Time to Short Big Tech

A new study by the University of Cambridge found many authors’ work has already been used – without their permission – to train large language modelsMore than half of published novelists in the UK believe artificial intelligence could eventually replace their work entirely, according to a new report from the University of Cambridge.<a href="https://www.mctd.ac.uk/wp-content/uploads/2025/11/MCTD-AIAndTheNovel-PolicyBrief-Accessible.html">The study</a>, conducted for the university’s Minderoo Centre for Technology and Democracy, suggests widespread unease about the speed and scale of AI’s advance into the literary world. <a href="https://www.theguardian.com/books/2025/nov/20/more-than-half-of-uk-novelists-believe-ai-will-replace-their-work">Continue reading...</a>

تظهر دراسة أجرتها جامعة كامبريدج أن أكثر من نصف الروائيين المنشورين في المملكة المتحدة يخشون أن تحل الذكاء الاصطناعي محل أعمالهم تمامًا. تسلط هذه الدراسة، التي أجريت لصالح مركز ميندورو للتكنولوجيا والديمقراطية، الضوء على القلق بشأن الاستخدام غير المصرح به لأعمال المؤلفين لتدريب نماذج اللغة الكبيرة والتقدم السريع للذكاء الاصطناعي في المجال الأدبي.

Un estudio de la Universidad de Cambridge revela que más de la mitad de los novelistas publicados en el Reino Unido temen que la inteligencia artificial pueda reemplazar completamente su trabajo. La investigación, realizada para el Minderoo Centre for Technology and Democracy, destaca las preocupaciones sobre el uso no autorizado de las obras de los autores para entrenar grandes modelos de lenguaje y el rápido avance de la IA en el ámbito literario.

Une étude de l'Université de Cambridge révèle que plus de la moitié des romanciers publiés au Royaume-Uni craignent que l'intelligence artificielle ne remplace complètement leur travail. La recherche, réalisée pour le Minderoo Centre for Technology and Democracy, met en lumière les préoccupations concernant l'utilisation non autorisée des œuvres des auteurs pour former de grands modèles de langage et l'avancement rapide de l'IA dans le domaine littéraire.

A study by the University of Cambridge reveals that over half of published novelists in the UK fear that artificial intelligence could completely replace their work. The research, conducted for the Minderoo Centre for Technology and Democracy, highlights concerns regarding the unauthorized use of authors' work to train large language models and the rapid advancement of AI in the literary field.

More than half of UK novelists believe AI will replace their work

<a href="https://petapixel.com/2025/11/20/trumps-draft-executive-order-targets-states-enacting-ai-transparency-laws/"><img width="1600" height="840" src="https://petapixel.com/assets/uploads/2024/03/eu-ai-act-3.jpg" class="attachment-card-large size-card-large wp-post-image" alt="EU AI Act illustration, cyborg face with electronic-style lines and markings" decoding="async" fetchpriority="high" /></a>President Donald Trump is mulling an executive order that would block state laws requiring AI companies to publish transparency reports and disclose how they train models.
[<a href="https://petapixel.com/2025/11/20/trumps-draft-executive-order-targets-states-enacting-ai-transparency-laws/">Read More</a>]

يعتزم الرئيس دونالد ترامب إصدار أمر تنفيذي يمنع الولايات من تنفيذ قوانين تتطلب من شركات الذكاء الاصطناعي نشر تقارير شفافية وكشف تفاصيل حول كيفية تدريب نماذجها.

El presidente Donald Trump está considerando una orden ejecutiva que impediría a los estados implementar leyes que exigen a las empresas de IA publicar informes de transparencia y detalles sobre sus procesos de entrenamiento de modelos.

Le président Donald Trump envisage un décret exécutif qui empêcherait les États de mettre en œuvre des lois exigeant des entreprises d'IA qu'elles publient des rapports de transparence et des détails sur leurs processus de formation de modèles.

President Donald Trump is considering an executive order that would prevent states from implementing laws that require AI companies to disclose transparency reports and details on their model training processes.

Trump’s Draft Executive Order Targets States Enacting AI Transparency Laws

<img width="800" height="450" src="https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-1300x731.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin: auto; margin-bottom: 5px;max-width: 100%;" link_thumbnail="" decoding="async" fetchpriority="high" srcset="https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-1300x731.jpg 1300w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-600x338.jpg 600w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-768x432.jpg 768w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-1536x864.jpg 1536w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-2048x1152.jpg 2048w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-150x84.jpg 150w" sizes="(max-width: 800px) 100vw, 800px" />The companies have partnered to build large-scale Nvidia-powered AI compute infrastructure and deploy Grok across new data centres.
The post <a href="https://analyticsindiamag.com/ai-news-updates/musks-xai-teams-with-humain-to-develop-saudi-arabias-ai-supercomputing-network/">Musk’s xAI Teams with HUMAIN to Develop Saudi Arabia’s AI Supercomputing Network</a> appeared first on <a href="https://analyticsindiamag.com">Analytics India Magazine</a>.

تعاونت شركة xAI التابعة لإيلون ماسك مع شركة HUMAIN السعودية لتطوير شبكة حوسبة فائقة الذكاء الاصطناعي في المملكة العربية السعودية. تهدف هذه الشراكة إلى تعزيز القدرات التكنولوجية للبلاد وتعتبر جزءًا من استراتيجية أوسع لوضع المملكة كقائد في مجال الذكاء الاصطناعي. ستستفيد المبادرة من موارد الحوسبة المتقدمة، بما في ذلك تقنية Nvidia.

xAI de Elon Musk se ha asociado con la empresa saudí HUMAIN para desarrollar una red de supercomputación de IA en Arabia Saudita. Esta colaboración tiene como objetivo mejorar las capacidades tecnológicas del país y forma parte de una estrategia más amplia para posicionar a Arabia Saudita como líder en inteligencia artificial. La iniciativa aprovechará recursos informáticos avanzados, incluida la tecnología de Nvidia.

xAI d'Elon Musk s'est associé à l'entreprise saoudienne HUMAIN pour développer un réseau de supercalculateur d'IA en Arabie Saoudite. Cette collaboration vise à renforcer les capacités technologiques du pays et s'inscrit dans une stratégie plus large visant à positionner l'Arabie Saoudite en tant que leader en intelligence artificielle. L'initiative s'appuiera sur des ressources informatiques avancées, y compris la technologie Nvidia.

Elon Musk's xAI has partnered with the Saudi Arabian company HUMAIN to develop an AI supercomputing network in Saudi Arabia. This collaboration aims to enhance the country's technological capabilities and is part of a broader strategy to position Saudi Arabia as a leader in artificial intelligence. The initiative will leverage advanced computing resources, including Nvidia technology.

Musk’s xAI Teams with HUMAIN to Develop Saudi Arabia’s AI Supercomputing Network

NcodiN, a French deeptech startup,has secured a €16 million seed investment led by MIG Capital AG through its MIGFonds 17 and 18. The round also includes participation from Maverick Silicon,PhotonVent...

نجحت شركة NcodiN، وهي شركة ناشئة فرنسية في مجال التكنولوجيا العميقة، في تأمين استثمار بقيمة 16 مليون يورو في جولة التمويل الأولية، بقيادة MIG Capital AG من خلال صناديق MIGFonds 17 و 18. كما شهدت الجولة مشاركة من Maverick Silicon وPhotonVent.

NcodiN, una startup francesa de deeptech, ha asegurado una inversión de 16 millones de euros en su ronda semilla, liderada por MIG Capital AG a través de sus MIGFonds 17 y 18. La ronda también contó con la participación de Maverick Silicon y PhotonVent.

NcodiN, une startup française de deeptech, a sécurisé un investissement de 16 millions d'euros en seed, dirigé par MIG Capital AG à travers ses MIGFonds 17 et 18. Le tour de financement a également vu la participation de Maverick Silicon et PhotonVent.

NcodiN, a French deeptech startup, has secured a €16 million seed investment led by MIG Capital AG through its MIGFonds 17 and 18. The funding round also saw participation from Maverick Silicon and PhotonVent.

NcodiN secures €16M seed investment led by MIG Capital

arXiv:2511.12001v2 Announce Type: replace 
Abstract: Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this double-edged role of Chain-of-Thought (CoT) explanations in multimodal moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones. Specifically, we analyze reasoning errors in vision language models (VLMs) and how they impact user trust and the ability to detect errors. Our findings reveal two key effects: (1) users often equate trust with outcome agreement, sustaining reliance even when reasoning is flawed, and (2) the confident tone suppresses error detection while maintaining reliance, showing that delivery styles can override correctness. These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust. All code will be released publicly.

تتناول المقالة الدور المزدوج لتفسيرات سلسلة التفكير (CoT) في تعزيز الشفافية وإمكانية تعزيز التحيز التأكيدي لدى المستخدمين. تبرز كيف أن المستخدمين غالبًا ما يساوون الثقة بالاتفاق على النتائج، حتى عندما يكون التفكير معيبًا، وكيف يمكن أن تقمع نغمات التسليم الواثقة اكتشاف الأخطاء. يسلط ذلك الضوء على تعقيد تفسيرات CoT في نماذج اللغة البصرية (VLM) وتأثيرها على ثقة المستخدم وقدرته على اكتشاف الأخطاء.

El artículo examina el doble papel de las explicaciones de Cadena de Pensamiento (CoT) en la mejora de la transparencia y el posible fomento del sesgo de confirmación en los usuarios. Destaca cómo los usuarios a menudo equiparan la confianza con el acuerdo en los resultados, incluso cuando el razonamiento es defectuoso, y cómo los tonos de entrega confiados pueden suprimir la detección de errores. Esto subraya la complejidad de las explicaciones de CoT en los modelos de lenguaje visual (VLM) y su impacto en la confianza del usuario y el reconocimiento de errores.

L'article examine le double rôle des explications Chain-of-Thought (CoT) dans l'amélioration de la transparence et le risque de favoriser le biais de confirmation chez les utilisateurs. Il met en évidence comment les utilisateurs associent souvent la confiance à l'accord sur les résultats, même lorsque le raisonnement est erroné, et comment des tons de livraison confiants peuvent supprimer la détection des erreurs. Cela souligne la complexité des explications CoT dans les modèles de langage visuel (VLM) et leur impact sur la confiance des utilisateurs et la reconnaissance des erreurs.

The article examines the dual role of Chain-of-Thought (CoT) explanations in enhancing transparency and potentially fostering confirmation bias in users. It highlights how users often equate trust with agreement on outcomes, even when reasoning is flawed, and how confident delivery tones can suppress error detection. This underscores the complexity of CoT explanations in vision language models (VLMs) and their impact on user trust and error recognition.

Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

arXiv:2511.14998v1 Announce Type: new 
Abstract: We introduce FinCriticalED (Financial Critical Error Detection), a visual benchmark for evaluating OCR and vision language models on financial documents at the fact level. Financial documents contain visually dense and table heavy layouts where numerical and temporal information is tightly coupled with structure. In high stakes settings, small OCR mistakes such as sign inversion or shifted dates can lead to materially different interpretations, while traditional OCR metrics like ROUGE and edit distance capture only surface level text similarity. \ficriticaled provides 500 image-HTML pairs with expert annotated financial facts covering over seven hundred numerical and temporal facts. It introduces three key contributions. First, it establishes the first fact level evaluation benchmark for financial document understanding, shifting evaluation from lexical overlap to domain critical factual correctness. Second, all annotations are created and verified by financial experts with strict quality control over signs, magnitudes, and temporal expressions. Third, we develop an LLM-as-Judge evaluation pipeline that performs structured fact extraction and contextual verification for visually complex financial documents. We benchmark OCR systems, open source vision language models, and proprietary models on FinCriticalED. Results show that although the strongest proprietary models achieve the highest factual accuracy, substantial errors remain in visually intricate numerical and temporal contexts. Through quantitative evaluation and expert case studies, FinCriticalED provides a rigorous foundation for advancing visual factual precision in financial and other precision critical domains.

تم تقديم FinCriticalED (كشف الأخطاء الحرجة المالية) كمعيار بصري لتقييم نماذج OCR ونماذج اللغة البصرية على الوثائق المالية على مستوى الحقائق. يتناول هذا المعيار التحديات التي تطرحها التخطيطات الكثيفة بصريًا للوثائق المالية، حيث يمكن أن تؤدي الأخطاء الصغيرة في OCR إلى تفسيرات كبيرة. يوفر 500 زوج من الصور وHTML مع حقائق مالية تم وضع علامات عليها من قبل خبراء، مما يمثل تحولًا من المقاييس التقليدية إلى التركيز على الدقة الواقعية.

FinCriticalED (Detección de Errores Críticos Financieros) se presenta como un estándar visual para evaluar modelos de OCR y de lenguaje visual en documentos financieros a nivel de hechos. Este estándar aborda los desafíos que presentan los diseños visualmente densos de los documentos financieros, donde pequeños errores de OCR pueden llevar a interpretaciones significativas. Proporciona 500 pares de imagen-HTML con hechos financieros anotados por expertos, marcando un cambio de métricas tradicionales hacia un enfoque en la corrección fáctica.

FinCriticalED (Détection d'Erreurs Critiques Financières) est présenté comme une référence visuelle pour évaluer les modèles OCR et de langage visuel spécifiquement sur les documents financiers au niveau des faits. Cette référence répond aux défis posés par les mises en page visuellement denses des documents financiers, où de petites erreurs OCR peuvent entraîner des interprétations significatives. Elle fournit 500 paires image-HTML avec des faits financiers annotés par des experts, marquant un passage des métriques traditionnelles à un accent sur la justesse factuelle.

FinCriticalED (Financial Critical Error Detection) is introduced as a visual benchmark for evaluating OCR and vision language models specifically on financial documents at the fact level. This benchmark addresses the challenges posed by the visually dense layouts of financial documents, where minor OCR errors can lead to significant misinterpretations. It provides 500 image-HTML pairs with expert-annotated financial facts, marking a shift from traditional metrics to a focus on factual correctness.

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation

arXiv:2511.15443v1 Announce Type: cross 
Abstract: Dense retrieval has become a foundational paradigm in modern search systems, especially on short-video platforms. However, most industrial systems adopt a self-reinforcing training pipeline that relies on historically exposed user interactions for supervision. This paradigm inevitably leads to a filter bubble effect, where potentially relevant but previously unseen content is excluded from the training signal, biasing the model toward narrow and conservative retrieval. In this paper, we present CroPS (Cross-Perspective Positive Samples), a novel retrieval data engine designed to alleviate this problem by introducing diverse and semantically meaningful positive examples from multiple perspectives. CroPS enhances training with positive signals derived from user query reformulation behavior (query-level), engagement data in recommendation streams (system-level), and world knowledge synthesized by large language models (knowledge-level). To effectively utilize these heterogeneous signals, we introduce a Hierarchical Label Assignment (HLA) strategy and a corresponding H-InfoNCE loss that together enable fine-grained, relevance-aware optimization. Extensive experiments conducted on Kuaishou Search, a large-scale commercial short-video search platform, demonstrate that CroPS significantly outperforms strong baselines both offline and in live A/B tests, achieving superior retrieval performance and reducing query reformulation rates. CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.

CroPS (عينات إيجابية عبر وجهات نظر مختلفة) هو محرك بيانات استرجاع جديد يهدف إلى تحسين أنظمة الاسترجاع الكثيف على منصات البحث عن مقاطع الفيديو القصيرة. يتناول تأثير فقاعة الفلتر الناتج عن الأساليب التقليدية للتدريب التي تعتمد على التفاعلات التاريخية للمستخدمين. من خلال تقديم أمثلة إيجابية متنوعة من وجهات نظر متعددة، يسعى CroPS إلى تحسين ملاءمة وتنوع نتائج البحث.

CroPS (Cross-Perspective Positive Samples) es un nuevo motor de datos de recuperación diseñado para mejorar los sistemas de recuperación densa en plataformas de búsqueda de videos cortos. Aborda el efecto burbuja de filtro causado por los métodos de entrenamiento tradicionales que dependen de interacciones históricas de usuarios. Al introducir ejemplos positivos diversos desde múltiples perspectivas, CroPS busca mejorar la relevancia y diversidad de los resultados de búsqueda.

CroPS (Cross-Perspective Positive Samples) est un nouveau moteur de données de récupération visant à améliorer les systèmes de récupération dense sur les plateformes de recherche de courtes vidéos. Il aborde l'effet de bulle de filtre causé par les méthodes d'entraînement traditionnelles qui s'appuient sur les interactions historiques des utilisateurs. En introduisant des exemples positifs diversifiés provenant de plusieurs perspectives, CroPS cherche à améliorer la pertinence et la diversité des résultats de recherche.

CroPS (Cross-Perspective Positive Samples) is a new retrieval data engine aimed at enhancing dense retrieval systems in short-video search platforms. It addresses the filter bubble effect caused by traditional training methods that rely on historical user interactions. By introducing diverse positive examples from various perspectives, including user query reformulation and engagement data, CroPS seeks to improve the relevance and diversity of search results.

CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search

arXiv:2511.15709v1 Announce Type: cross 
Abstract: Recent works have shown that tokenisation is NP-complete. However, these works assume tokenisation is applied to inputs with unboundedly large alphabets -- an unrealistic assumption, given that in practice tokenisers operate over fixed-size alphabets, such as bytes or Unicode characters. We close this gap by analysing tokenisation over bounded $n$-ary alphabets, considering two natural variants: bottom-up tokenisation and direct tokenisation, where we must, respectively, select a sequence of merge operations or a vocabulary whose application optimally compresses a dataset. First, we note that proving hardness results for an $n$-ary alphabet proves the same results for alphabets of any larger size. We then prove that even with binary alphabets, both variants are not only NP-complete, but admit no polynomial-time approximation scheme (unless P=NP). We further show that direct tokenisation remains NP-complete even when applied to unary alphabets. While unary alphabets may not be practically useful, this result establishes that the computational intractability of tokenisation is not an artifact of large alphabets or complex constructions, but a fundamental barrier. Overall, our results explain why practical algorithms such as BPE and UnigramLM are heuristic, and points toward approximation algorithms being an important path going forward for tokenisation research.

أظهرت الأبحاث الحديثة أن تقسيم الرموز هو NP-complete، وهو اكتشاف يعتمد على فرضية وجود أبجديات غير محدودة. تتناول هذه الدراسة التطبيق العملي لتقسيم الرموز على أبجديات n-ary المحدودة، مع تحليل طرق تقسيم الرموز من الأسفل إلى الأعلى والتقسيم المباشر. تشير النتائج إلى أن كلا الطريقتين NP-complete حتى بالنسبة للأبجديات الثنائية، ولا يوجد مخطط تقريب متعدد الحدود ما لم يكن P=NP.

Investigaciones recientes han establecido que la tokenización es NP-completa, un hallazgo basado en la suposición de alfabetos de tamaño ilimitado. Este estudio aborda la aplicación práctica de la tokenización sobre alfabetos n-arios acotados, analizando métodos de tokenización ascendente y directa. Los hallazgos indican que ambos métodos son NP-completos incluso para alfabetos binarios, y no existe un esquema de aproximación polinómica a menos que P=NP.

Des recherches récentes ont établi que la tokenisation est NP-complete, une découverte basée sur l'hypothèse d'alphabets de taille illimitée. Cette étude aborde l'application pratique de la tokenisation sur des alphabets n-aires bornés, en analysant les méthodes de tokenisation ascendante et directe. Les résultats indiquent que les deux méthodes sont NP-completes même pour des alphabets binaires, et qu'aucun schéma d'approximation polynomial n'existe à moins que P=NP.

Recent research has established that tokenisation is NP-complete, a finding based on the assumption of unboundedly large alphabets. This study addresses the practical application of tokenisation over bounded n-ary alphabets, analyzing bottom-up and direct tokenisation methods. The findings indicate that both methods are NP-complete even for binary alphabets, and no polynomial-time approximation scheme exists unless P=NP.

Tokenisation over Bounded Alphabets is Hard

arXiv:2511.08916v2 Announce Type: replace 
Abstract: Large language models (LLMs) have achieved impressive performance across a wide range of natural language processing tasks, yet they often produce hallucinated content that undermines factual reliability. To address this challenge, we introduce HalluClean, a lightweight and task-agnostic framework for detecting and correcting hallucinations in LLM-generated text. HalluClean adopts a reasoning-enhanced paradigm, explicitly decomposing the process into planning, execution, and revision stages to identify and refine unsupported claims. It employs minimal task-routing prompts to enable zero-shot generalization across diverse domains, without relying on external knowledge sources or supervised detectors. We conduct extensive evaluations on five representative tasks-question answering, dialogue, summarization, math word problems, and contradiction detection. Experimental results show that HalluClean significantly improves factual consistency and outperforms competitive baselines, demonstrating its potential to enhance the trustworthiness of LLM outputs in real-world applications.

HalluClean هو إطار جديد مصمم للكشف عن وتصحيح الهلاوس في نماذج اللغة الكبيرة (LLMs). تعزز هذه الطريقة غير المرتبطة بالمهام موثوقية النصوص التي تم إنشاؤها بواسطة LLMs من خلال تقسيم العملية إلى مراحل التخطيط والتنفيذ والمراجعة. يستخدم HalluClean مطالبات توجيه مهام الحد الأدنى لتحقيق تعميم بدون أمثلة عبر مجالات متنوعة، مما يحسن بشكل كبير من الاتساق الواقعي في النتائج.

HalluClean es un nuevo marco diseñado para detectar y corregir las alucinaciones en los modelos de lenguaje de gran tamaño (LLMs). Este enfoque agnóstico a la tarea mejora la fiabilidad del texto generado por LLM al descomponer el proceso en etapas de planificación, ejecución y revisión. HalluClean utiliza indicaciones mínimas de enrutamiento de tareas para la generalización sin ejemplos en diversos dominios, mejorando significativamente la consistencia fáctica en los resultados.

HalluClean est un nouveau cadre conçu pour détecter et corriger les hallucinations dans les modèles de langage de grande taille (LLMs). Cette approche agnostique aux tâches améliore la fiabilité des textes générés par les LLMs en décomposant le processus en étapes de planification, d'exécution et de révision. HalluClean utilise des invites de routage minimales pour une généralisation sans échantillon dans divers domaines, améliorant ainsi considérablement la cohérence factuelle des résultats.

HalluClean is a new framework designed to detect and correct hallucinations in large language models (LLMs). This task-agnostic approach enhances the reliability of LLM-generated text by decomposing the process into planning, execution, and revision stages. HalluClean utilizes minimal task-routing prompts for zero-shot generalization across various domains, significantly improving factual consistency in outputs.

Navigation with VLM framework: Towards Going to Any Language

Was this article worth reading? Share it