arXiv:2510.17793v2 Announce Type: replace-cross 
Abstract: Finetuning specialized generative evaluators has emerged as a popular paradigm to meet the increasing demand for scalable evaluation during both training and test-time. However, recent work has largely focused on applying new methodology, such as reinforcement learning (RL), to training evaluators, shying away from large-scale, data-driven development. In this work, we focus on data scaling, curating a set of 2.5M samples spanning five unique evaluation tasks (pairwise, step-level, reference-free and reference-based verification, and single rating) and multiple domains focused on reasoning evaluation. With our data, we train Foundational Automatic Reasoning Evaluators (FARE), a family of 8B and 20B (with 3.6B active) parameter evaluators, with a simple iterative rejection-sampling supervised finetuning (SFT) approach. FARE-8B challenges larger specialized RL-trained evaluators and FARE-20B sets the new standard for open-source evaluators, surpassing specialized 70B+ evaluators. Beyond static benchmarks, we evaluate FARE in real-world tasks: As inference-time rerankers, FARE-20B achieves near-oracle performance on MATH. As verifiers in RL training, FARE improves the downstream RL-trained model performance by up to 14.1% vs. string-matching verifiers. When initialized from FARE, a continually-finetuned FARE-Code outperforms gpt-oss-20B by 65% on evaluating test-case quality.

تناقش الورقة تطوير المقيّمين الآليين الأساسيين (FARE)، الذين هم مقيّمون توليديون مصممون لتعزيز عمليات التقييم في المجالات التي تركز على التفكير. من خلال تحسين هؤلاء المقيّمين باستخدام مجموعة بيانات تحتوي على 2.5 مليون عينة عبر خمس مهام تقييم، تهدف الدراسة إلى تحسين القابلية للتوسع والأداء أثناء التدريب والاختبار. تتحدى نماذج FARE، التي تحتوي على 8 مليارات و20 مليار معلمة، المقيّمين الحاليين وتحدد معايير جديدة للتقييم مفتوح المصدر.

El artículo aborda el desarrollo de los Evaluadores Automáticos Fundamentales (FARE), que son evaluadores generativos diseñados para mejorar los procesos de evaluación en dominios centrados en el razonamiento. Al ajustar estos evaluadores con un conjunto de datos de 2.5 millones de muestras en cinco tareas de evaluación, el estudio busca mejorar la escalabilidad y el rendimiento durante el entrenamiento y las pruebas. Los modelos FARE, con 8 mil millones y 20 mil millones de parámetros, desafían a los evaluadores existentes y establecen nuevos estándares para la evaluación de código abierto.

L'article traite du développement des Évaluateurs Automatiques Fondamentaux (FARE), qui sont des évaluateurs génératifs conçus pour améliorer les processus d'évaluation dans des domaines centrés sur le raisonnement. En ajustant ces évaluateurs avec un ensemble de données de 2,5 millions d'échantillons répartis sur cinq tâches d'évaluation, l'étude vise à améliorer l'évolutivité et la performance pendant l'entraînement et les tests. Les modèles FARE, avec 8 milliards et 20 milliards de paramètres, défient les évaluateurs existants et établissent de nouvelles références pour l'évaluation open-so…

The paper discusses the development of Foundational Automatic Reasoning Evaluators (FARE), which are generative evaluators designed to enhance evaluation processes in reasoning-centric domains. By fine-tuning these evaluators with a dataset of 2.5 million samples across five evaluation tasks, the study aims to improve scalability and performance during training and testing. The FARE models, with 8B and 20B parameters, challenge existing evaluators and set new benchmarks for open-source evaluation.

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

Medtech founder and educator Rush Bartlett talks about medtech innovation on both sides of the pond.
Read more: <a rel="nofollow" href="https://www.siliconrepublic.com/innovation/stanford-biodesign-medtech-innovation-bioinnovate-ireland-galway-startups-healthcare">Ireland is top for medtech innovation, says Stanford expert</a>

يبرز راش بارتليت، مؤسس ومربي في مجال التكنولوجيا الطبية، مكانة أيرلندا الرائدة في الابتكار في هذا المجال، مشيرًا إلى التقدم الذي يحدث في غالواي. تعكس رؤاه اعترافًا متزايدًا بأيرلندا كمركز لتطوير التكنولوجيا الصحية.

Rush Bartlett, fundador y educador en medtech, destaca la posición de Irlanda como líder en innovación medtech, enfatizando los avances que se están realizando en Galway. Sus comentarios reflejan un reconocimiento creciente de Irlanda como un centro de desarrollo tecnológico en el sector de la salud.

Rush Bartlett, fondateur et éducateur en medtech, souligne la position de leader de l'Irlande en matière d'innovation medtech, mettant en avant les avancées réalisées à Galway. Ses réflexions témoignent d'une reconnaissance croissante de l'Irlande comme un pôle de développement technologique dans le secteur de la santé.

Rush Bartlett, a medtech founder and educator, highlights Ireland's leading position in medtech innovation, emphasizing the advancements occurring in Galway. His insights reflect a growing recognition of Ireland as a hub for healthcare technology development.

Ireland is top for medtech innovation, says Stanford expert

It’s not the time to bet against the biggest US technology names, according to short-seller Carson Block, even as warnings rise about a potential bubble in artificial intelligence.

قال كارسن بلوك، مؤسس شركة مادّي ووترز، إنه ليس الوقت المناسب للمراهنة ضد أكبر شركات التكنولوجيا الأمريكية، على الرغم من تزايد التحذيرات بشأن احتمال وجود فقاعة في الذكاء الاصطناعي. ويعتقد أن الظروف الحالية في السوق لا تدعم المراهنة ضد هؤلاء العمالقة التكنولوجيين.

Carson Block, fundador de Muddy Waters, ha desaconsejado apostar en contra de las principales empresas tecnológicas de EE. UU., a pesar de las crecientes preocupaciones sobre una posible burbuja en el sector de la inteligencia artificial. Considera que las condiciones actuales del mercado no favorecen las apuestas contra estos gigantes tecnológicos.

Carson Block, le fondateur de Muddy Waters, a déconseillé de parier contre les grandes entreprises technologiques américaines, malgré les préoccupations croissantes concernant une éventuelle bulle dans le secteur de l'intelligence artificielle. Il estime que les conditions actuelles du marché ne favorisent pas les paris contre ces géants de la technologie.

Carson Block, the founder of Muddy Waters, has advised against shorting major US technology companies, despite increasing concerns about a potential bubble in the artificial intelligence sector. He believes that the current market conditions do not favor betting against these tech giants.

Muddy Waters’ Carson Block Says It’s Not the Time to Short Big Tech

A new study by the University of Cambridge found many authors’ work has already been used – without their permission – to train large language modelsMore than half of published novelists in the UK believe artificial intelligence could eventually replace their work entirely, according to a new report from the University of Cambridge.<a href="https://www.mctd.ac.uk/wp-content/uploads/2025/11/MCTD-AIAndTheNovel-PolicyBrief-Accessible.html">The study</a>, conducted for the university’s Minderoo Centre for Technology and Democracy, suggests widespread unease about the speed and scale of AI’s advance into the literary world. <a href="https://www.theguardian.com/books/2025/nov/20/more-than-half-of-uk-novelists-believe-ai-will-replace-their-work">Continue reading...</a>

تظهر دراسة أجرتها جامعة كامبريدج أن أكثر من نصف الروائيين المنشورين في المملكة المتحدة يخشون أن تحل الذكاء الاصطناعي محل أعمالهم تمامًا. تسلط هذه الدراسة، التي أجريت لصالح مركز ميندورو للتكنولوجيا والديمقراطية، الضوء على القلق بشأن الاستخدام غير المصرح به لأعمال المؤلفين لتدريب نماذج اللغة الكبيرة والتقدم السريع للذكاء الاصطناعي في المجال الأدبي.

Un estudio de la Universidad de Cambridge revela que más de la mitad de los novelistas publicados en el Reino Unido temen que la inteligencia artificial pueda reemplazar completamente su trabajo. La investigación, realizada para el Minderoo Centre for Technology and Democracy, destaca las preocupaciones sobre el uso no autorizado de las obras de los autores para entrenar grandes modelos de lenguaje y el rápido avance de la IA en el ámbito literario.

Une étude de l'Université de Cambridge révèle que plus de la moitié des romanciers publiés au Royaume-Uni craignent que l'intelligence artificielle ne remplace complètement leur travail. La recherche, réalisée pour le Minderoo Centre for Technology and Democracy, met en lumière les préoccupations concernant l'utilisation non autorisée des œuvres des auteurs pour former de grands modèles de langage et l'avancement rapide de l'IA dans le domaine littéraire.

A study by the University of Cambridge reveals that over half of published novelists in the UK fear that artificial intelligence could completely replace their work. The research, conducted for the Minderoo Centre for Technology and Democracy, highlights concerns regarding the unauthorized use of authors' work to train large language models and the rapid advancement of AI in the literary field.

More than half of UK novelists believe AI will replace their work

<a href="https://petapixel.com/2025/11/20/trumps-draft-executive-order-targets-states-enacting-ai-transparency-laws/"><img width="1600" height="840" src="https://petapixel.com/assets/uploads/2024/03/eu-ai-act-3.jpg" class="attachment-card-large size-card-large wp-post-image" alt="EU AI Act illustration, cyborg face with electronic-style lines and markings" decoding="async" fetchpriority="high" /></a>President Donald Trump is mulling an executive order that would block state laws requiring AI companies to publish transparency reports and disclose how they train models.
[<a href="https://petapixel.com/2025/11/20/trumps-draft-executive-order-targets-states-enacting-ai-transparency-laws/">Read More</a>]

يعتزم الرئيس دونالد ترامب إصدار أمر تنفيذي يمنع الولايات من تنفيذ قوانين تتطلب من شركات الذكاء الاصطناعي نشر تقارير شفافية وكشف تفاصيل حول كيفية تدريب نماذجها.

El presidente Donald Trump está considerando una orden ejecutiva que impediría a los estados implementar leyes que exigen a las empresas de IA publicar informes de transparencia y detalles sobre sus procesos de entrenamiento de modelos.

Le président Donald Trump envisage un décret exécutif qui empêcherait les États de mettre en œuvre des lois exigeant des entreprises d'IA qu'elles publient des rapports de transparence et des détails sur leurs processus de formation de modèles.

President Donald Trump is considering an executive order that would prevent states from implementing laws that require AI companies to disclose transparency reports and details on their model training processes.

Trump’s Draft Executive Order Targets States Enacting AI Transparency Laws

<img width="800" height="450" src="https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-1300x731.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin: auto; margin-bottom: 5px;max-width: 100%;" link_thumbnail="" decoding="async" fetchpriority="high" srcset="https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-1300x731.jpg 1300w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-600x338.jpg 600w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-768x432.jpg 768w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-1536x864.jpg 1536w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-2048x1152.jpg 2048w, https://analyticsindiamag.com/wp-content/uploads/2025/01/Elon-Musk-is-Using-100000-GPUs-for-Grok-3-But-Why-150x84.jpg 150w" sizes="(max-width: 800px) 100vw, 800px" />The companies have partnered to build large-scale Nvidia-powered AI compute infrastructure and deploy Grok across new data centres.
The post <a href="https://analyticsindiamag.com/ai-news-updates/musks-xai-teams-with-humain-to-develop-saudi-arabias-ai-supercomputing-network/">Musk’s xAI Teams with HUMAIN to Develop Saudi Arabia’s AI Supercomputing Network</a> appeared first on <a href="https://analyticsindiamag.com">Analytics India Magazine</a>.

تعاونت شركة xAI التابعة لإيلون ماسك مع شركة HUMAIN السعودية لتطوير شبكة حوسبة فائقة الذكاء الاصطناعي في المملكة العربية السعودية. تهدف هذه الشراكة إلى تعزيز القدرات التكنولوجية للبلاد وتعتبر جزءًا من استراتيجية أوسع لوضع المملكة كقائد في مجال الذكاء الاصطناعي. ستستفيد المبادرة من موارد الحوسبة المتقدمة، بما في ذلك تقنية Nvidia.

xAI de Elon Musk se ha asociado con la empresa saudí HUMAIN para desarrollar una red de supercomputación de IA en Arabia Saudita. Esta colaboración tiene como objetivo mejorar las capacidades tecnológicas del país y forma parte de una estrategia más amplia para posicionar a Arabia Saudita como líder en inteligencia artificial. La iniciativa aprovechará recursos informáticos avanzados, incluida la tecnología de Nvidia.

xAI d'Elon Musk s'est associé à l'entreprise saoudienne HUMAIN pour développer un réseau de supercalculateur d'IA en Arabie Saoudite. Cette collaboration vise à renforcer les capacités technologiques du pays et s'inscrit dans une stratégie plus large visant à positionner l'Arabie Saoudite en tant que leader en intelligence artificielle. L'initiative s'appuiera sur des ressources informatiques avancées, y compris la technologie Nvidia.

Elon Musk's xAI has partnered with the Saudi Arabian company HUMAIN to develop an AI supercomputing network in Saudi Arabia. This collaboration aims to enhance the country's technological capabilities and is part of a broader strategy to position Saudi Arabia as a leader in artificial intelligence. The initiative will leverage advanced computing resources, including Nvidia technology.

Musk’s xAI Teams with HUMAIN to Develop Saudi Arabia’s AI Supercomputing Network

NcodiN, a French deeptech startup,has secured a €16 million seed investment led by MIG Capital AG through its MIGFonds 17 and 18. The round also includes participation from Maverick Silicon,PhotonVent...

نجحت شركة NcodiN، وهي شركة ناشئة فرنسية في مجال التكنولوجيا العميقة، في تأمين استثمار بقيمة 16 مليون يورو في جولة التمويل الأولية، بقيادة MIG Capital AG من خلال صناديق MIGFonds 17 و 18. كما شهدت الجولة مشاركة من Maverick Silicon وPhotonVent.

NcodiN, una startup francesa de deeptech, ha asegurado una inversión de 16 millones de euros en su ronda semilla, liderada por MIG Capital AG a través de sus MIGFonds 17 y 18. La ronda también contó con la participación de Maverick Silicon y PhotonVent.

NcodiN, une startup française de deeptech, a sécurisé un investissement de 16 millions d'euros en seed, dirigé par MIG Capital AG à travers ses MIGFonds 17 et 18. Le tour de financement a également vu la participation de Maverick Silicon et PhotonVent.

NcodiN, a French deeptech startup, has secured a €16 million seed investment led by MIG Capital AG through its MIGFonds 17 and 18. The funding round also saw participation from Maverick Silicon and PhotonVent.

NcodiN secures €16M seed investment led by MIG Capital

arXiv:2511.12596v1 Announce Type: cross 
Abstract: Large Language Models (LLMs) often suffer from mode collapse, repeatedly generating the same few completions even when many valid answers exist, limiting their diversity across a wide range of tasks. We introduce Group-Aware Policy Optimization (GAPO), a simple extension of the recent and popular Group Relative Policy Optimization (GRPO) that computes rewards over the group as a whole. GAPO enables learning from the group-level properties such as diversity and coverage. We demonstrate GAPO using a frequency-aware reward function that encourages uniform sampling over valid LLM completions, and show that GAPO-trained models produce valid and more diverse model responses. Beyond this setup, GAPO generalizes to open-ended prompts and improves response diversity without compromising accuracy on standard LLM benchmarks (GSM8K, MATH, HumanEval, MMLU-Pro). Our code will be made publicly available.

تعاني النماذج اللغوية الكبيرة (LLMs) غالبًا من انهيار الوضع، حيث تنتج استجابات محدودة على الرغم من وجود إجابات متنوعة. لمعالجة هذه المشكلة، تم تقديم تحسين السياسة الواعية بالمجموعات (GAPO)، وهو امتداد لتحسين السياسة النسبية للمجموعات (GRPO). يركز GAPO على الخصائص على مستوى المجموعة مثل التنوع والتغطية، باستخدام دالة مكافأة واعية بالتردد لتعزيز أخذ عينات موحدة من الاستجابات الصالحة. تشير النتائج إلى أن النماذج المدربة باستخدام GAPO تنتج استجابات أكثر تنوعًا وصلاحية مع الحفاظ على الدقة في المعايير القياسية مثل GSM8K وMATH وHumanEval وMMLU-Pro. سيتم توفير الشيفرة للجمهور.

Los Modelos de Lenguaje Grande (LLMs) a menudo sufren de colapso de modo, generando respuestas limitadas a pesar de la existencia de respuestas diversas. Para abordar este problema, se ha introducido la Optimización de Políticas Consciente de Grupos (GAPO), una extensión de la Optimización de Políticas Relativas a Grupos (GRPO). GAPO se centra en propiedades a nivel de grupo como la diversidad y la cobertura, utilizando una función de recompensa consciente de la frecuencia para fomentar un muestreo uniforme de las completaciones válidas. Los resultados indican que los modelos entrenados con GA…

Les grands modèles de langage (LLM) souffrent souvent d'un effondrement des modes, générant des réponses limitées malgré la diversité des réponses possibles. Pour remédier à ce problème, des chercheurs ont introduit l'optimisation de politique consciente des groupes (GAPO), une extension de l'optimisation de politique relative aux groupes (GRPO). GAPO se concentre sur les propriétés de groupe telles que la diversité et la couverture, utilisant une fonction de récompense sensible à la fréquence pour promouvoir un échantillonnage uniforme des complétions valides. Les résultats montrent que les m…

Large Language Models (LLMs) often experience mode collapse, generating limited responses despite the availability of diverse answers. To address this issue, researchers have introduced Group-Aware Policy Optimization (GAPO), an extension of Group Relative Policy Optimization (GRPO). GAPO focuses on group-level properties such as diversity and coverage, utilizing a frequency-aware reward function to promote uniform sampling of valid completions. The results indicate that models trained with GAPO yield more varied and valid responses while maintaining accuracy across standard benchmarks like GS…

Group-Aware Reinforcement Learning for Output Diversity in Large Language Models

arXiv:2509.22855v3 Announce Type: replace 
Abstract: Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for T - o(T) rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB . We provide theoretical guarantees showing that both strategies require only O(log T) manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.

Observation-Free Attacks on Online Learning to Rank

arXiv:2511.15406v1 Announce Type: cross 
Abstract: Reliable semantic segmentation is essential for clinical decision making, yet deep models rarely provide explicit statistical guarantees on their errors. We introduce a simple post-hoc framework that constructs confidence masks with distribution-free, image-level control of false-positive predictions. Given any pretrained segmentation model, we define a nested family of shrunken masks obtained either by increasing the score threshold or by applying morphological erosion. A labeled calibration set is used to select a single shrink parameter via conformal prediction, ensuring that, for new images that are exchangeable with the calibration data, the proportion of false positives retained in the confidence mask stays below a user-specified tolerance with high probability. The method is model-agnostic, requires no retraining, and provides finite-sample guarantees regardless of the underlying predictor. Experiments on a polyp-segmentation benchmark demonstrate target-level empirical validity. Our framework enables practical, risk-aware segmentation in settings where over-segmentation can have clinical consequences. Code at https://github.com/deel-ai-papers/conseco.

تم تقديم إطار عمل جديد للتحكم في الإيجابيات الكاذبة في تقسيم الصور، مما يعزز موثوقية التقسيم الدلالي في اتخاذ القرارات السريرية. تستخدم هذه الطريقة المستقلة عن النموذج التنبؤ المتوافق لإنشاء أقنعة ثقة تحافظ على مستوى محدد مسبقًا من الإيجابيات الكاذبة، دون الحاجة إلى إعادة التدريب. تُظهر الطريقة ضمانات عالية الاحتمالية للصور الجديدة، مما يمثل تقدمًا كبيرًا في التصوير الطبي.

Se ha introducido un nuevo marco para controlar los falsos positivos en la segmentación de imágenes, mejorando la fiabilidad de la segmentación semántica en la toma de decisiones clínicas. Este enfoque independiente del modelo utiliza la predicción conformal para crear máscaras de confianza que mantienen una tolerancia definida por el usuario para los falsos positivos, sin necesidad de reentrenamiento. El método demuestra garantías de alta probabilidad para nuevas imágenes, representando un avance significativo en la imagen médica.

Un nouveau cadre pour contrôler les faux positifs dans la segmentation d'images a été introduit, améliorant la fiabilité de la segmentation sémantique dans la prise de décision clinique. Cette approche indépendante du modèle utilise la prédiction conforme pour créer des masques de confiance qui maintiennent une tolérance définie par l'utilisateur pour les faux positifs, sans nécessiter de réentraînement. La méthode démontre des garanties de haute probabilité pour de nouvelles images, représentant une avancée significative dans l'imagerie médicale.

A new framework for controlling false positives in image segmentation has been introduced, enhancing the reliability of semantic segmentation in clinical decision-making. This model-agnostic approach utilizes conformal prediction to create confidence masks that maintain a user-defined tolerance for false positives, without requiring retraining. The method demonstrates high probability guarantees for new images, making it a significant advancement in medical imaging.

Controlling False Positives in Image Segmentation via Conformal Prediction

arXiv:2409.13566v3 Announce Type: replace 
Abstract: The application of TensorFlow pre-trained models in deep learning is explored, with an emphasis on practical guidance for tasks such as image classification and object detection. The study covers modern architectures, including ResNet, MobileNet, and EfficientNet, and demonstrates the effectiveness of transfer learning through real-world examples and experiments. A comparison of linear probing and model fine-tuning is presented, supplemented by visualizations using techniques like PCA, t-SNE, and UMAP, allowing for an intuitive understanding of the impact of these approaches. The work provides complete example code and step-by-step instructions, offering valuable insights for both beginners and advanced users. By integrating theoretical concepts with hands-on practice, the paper equips readers with the tools necessary to address deep learning challenges efficiently.

Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains

Was this article worth reading? Share it