arXiv:2511.10303v1 Announce Type: new 
Abstract: To improve Multi-step Mathematical Reasoning (MsMR) of Large Language Models (LLMs), it is crucial to obtain scalable supervision from the corpus by automatically critiquing mistakes in the reasoning process of MsMR and rendering a final verdict of the problem-solution. Most existing methods rely on crafting high-quality supervised fine-tuning demonstrations for critiquing capability enhancement and pay little attention to delving into the underlying reason for the poor critiquing performance of LLMs. In this paper, we orthogonally quantify and investigate the potential reason -- imbalanced evaluation preference, and conduct a statistical preference analysis. Motivated by the analysis of the reason, a novel perplexity-aware reinforcement learning algorithm is proposed to rectify the evaluation preference, elevating the critiquing capability. Specifically, to probe into LLMs' critiquing characteristics, a One-to-many Problem-Solution (OPS) benchmark is meticulously constructed to quantify the behavior difference of LLMs when evaluating the problem solutions generated by itself and others. Then, to investigate the behavior difference in depth, we conduct a statistical preference analysis oriented on perplexity and find an intriguing phenomenon -- ``LLMs incline to judge solutions with lower perplexity as correct'', which is dubbed as \textit{imbalanced evaluation preference}. To rectify this preference, we regard perplexity as the baton in the algorithm of Group Relative Policy Optimization, supporting the LLMs to explore trajectories that judge lower perplexity as wrong and higher perplexity as correct. Extensive experimental results on our built OPS and existing available critic benchmarks demonstrate the validity of our method.

Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning

<A HREF="https://fortune.com/2025/11/19/exclusive-doppel-raises-70-million-series-c-at-more-than-600-million-valuation-to-fight-ai-powered-social-engineering-attacks/"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251119/i39.jpg"></A>
<A HREF="http://www.techmeme.com/251119/p39#a251119p39" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Allie Garfinkle / <A HREF="http://www.fortune.com/">Fortune</A>: 
<A HREF="https://fortune.com/2025/11/19/exclusive-doppel-raises-70-million-series-c-at-more-than-600-million-valuation-to-fight-ai-powered-social-engineering-attacks/">Doppel, which makes an AI social engineering detection service, raised a $70M Series C led by Bessemer at a $600M+ valuation, up from $205M in May</A>&nbsp; &mdash;&nbsp; Senior Finance Reporter And Author Of Term Sheet&nbsp; &mdash;&nbsp; I tried to fool my brother, sort of.&nbsp; &mdash;&nbsp; Next to him and his Pekingese on the couch &hellip;

دوبل، وهي شركة متخصصة في خدمات الكشف عن الهندسة الاجتماعية باستخدام الذكاء الاصطناعي، نجحت في جمع 70 مليون دولار في جولة تمويل من السلسلة C بقيادة بيسمر. وقد زادت هذه الجولة من تقييم الشركة إلى أكثر من 600 مليون دولار، ارتفاعًا من 205 مليون دولار في مايو.

Doppel, una empresa que se especializa en servicios de detección de ingeniería social mediante IA, ha recaudado con éxito 70 millones de dólares en una ronda de financiación Serie C liderada por Bessemer. Esta ronda ha elevado la valoración de la empresa a más de 600 millones de dólares, un aumento significativo desde los 205 millones en mayo.

Doppel, une entreprise spécialisée dans les services de détection d'ingénierie sociale par IA, a levé avec succès 70 millions de dollars lors d'un tour de financement de série C dirigé par Bessemer. Ce tour de financement a porté la valorisation de l'entreprise à plus de 600 millions de dollars, contre 205 millions en mai.

Doppel, a company specializing in AI social engineering detection services, has successfully raised $70 million in a Series C funding round led by Bessemer. This funding round has increased the company's valuation to over $600 million, a significant rise from $205 million in May.

Doppel, which makes an AI social engineering detection service, raised a $70M Series C led by Bessemer at a $600M+ valuation, up from $205M in May (Allie Garfinkle/Fortune)

The service is customized for teachers' needs and includes added security and privacy, a collaborative workspace, and more.

OpenAI expands free educational offerings - here's what ChatGPT for Teachers can do

GPT-5.1-Codex-Max is ready to take on your next massive coding job. Here's what's new.

OpenAI's Codex Max solves one of my biggest AI coding annoyances - and adds dramatically faster performance

The agent offers one-click buying for all your holiday needs and will be free for all US-based users.

Perplexity's AI shopping tool is free for all now, just in time for Black Friday - how to use it

<a href="https://fetch.ai/">Fetch AI</a>, a startup founded and led by former DeepMind founding investor, Humayun Sheikh, <a href="https://www.businesswire.com/news/home/20251119088395/en/Fetch-Combines-Personalized-AI-with-Multi-Agent-Collaboration-to-Handle-Complex-Consumer-Tasks-Launches-Claim-Your-Agent-to-Fight-Brand-Knock-Offs">today announced the release</a> of three interconnected products designed to provide the trust, coordination, and interoperability needed for large-scale AI agent ecosystems. The launch includes <a href="https://asi1.ai/">ASI:One</a>, a personal-AI orchestration platform; <a href="https://business.fetch.ai/">Fetch Business</a>, a verification and discovery portal for brand agents; and <a href="https://agentverse.ai/?sort=relevancy&amp;page=1&amp;recommended=true">Agentverse</a>, an open directory hosting more than two million agents. Together, the system positions Fetch as an infrastructure provider for what it calls the “Agentic Web”—a layer where consumer AIs and brand AIs collaborate to complete tasks instead of merely suggesting them.The company says the tools address a central limitation in current consumer AI: models can provide recommendations but cannot reliably execute multi-step actions that require coordination across businesses. Fetch’s approach centers on enabling agents from different organizations to interoperate securely, using verified identities and shared context to complete end-to-end workflows.“We’re creating the same foundation for agents that Google created for websites,” said Humayun Sheikh, Founder and CEO of Fetch AI, and an early investor in DeepMind, in a press release provided to VentureBeat. “Instead of just finding information, your personal AI coordinates with verified brand agents to get things done.”<h2>Background: Fetch’s Founding and DeepMind Connection </h2>Fetch AI was founded in 2017 by Humayun Sheikh, an entrepreneur whose early investment in DeepMind helped support the company’s commercial development before its acquisition by Google. “I was one of the first five people at DeepMind and its first investor. My check was the first one in,” Sheikh said, reflecting on the period when advanced machine learning research was still largely inaccessible outside major technology companies.His early experience helped shape Fetch’s direction. “Even in 2013, it was clear to me that agentic systems were going to be the ones that worked. That’s where I focused—on the agentic web,” Sheikh noted. Fetch built on this thesis by developing infrastructure for autonomous software agents, focusing on verifiable identity, secure data exchange, and multi-agent coordination. Over the past several years, the company has expanded to a 70-person team across Cambridge and Menlo Park, raised approximately $60 million, and accumulated more than one million users interacting with its model—data that informed the design of the newly launched products.Sheikh added that his decision to bootstrap the company initially came directly from the proceeds of the DeepMind exit, noting in the interview that while the sale to Google was “a good exit,” he believed the team could have held out for a higher valuation. The early self-funding period allowed Fetch to begin work in 2015—well before transformer architectures went mainstream—on the hypothesis that agentic infrastructure would become foundational to applied AI.<h2>ASI:One — A Platform for Multi-Agent Orchestration</h2>At the core of the launch is ASI:One, a language model interface designed specifically for coordinating multiple agents rather than addressing isolated queries. Fetch describes it as an “intelligence layer” that handles context sharing, task routing, and preference modeling.The system stores user-level signals such as favored airlines, dietary constraints, budget ranges, loyalty program identifiers, and calendar availability. When a user requests a complex task—such as planning a trip with flights, hotels, and restaurant reservations—ASI:One retrieves those preferences and delegates work to the appropriate verified agents. The agents then return actionable outputs, including inventory and booking options, rather than generic recommendations.In practice, ASI:One functions as a workflow generator across organizational boundaries. By contrast with conventional LLM applications, which often rely on APIs or RAG techniques to surface information, ASI:One is built to coordinate autonomous agents that can complete transactions. Fetch notes that personalization improves over time as the model accumulates structured preference data.Sheikh emphasized the distinction between orchestrated execution and traditional AI output. “This isn’t searching for options separately and hoping they work together,” he said. “It’s orchestration.” He added that Fetch’s architecture is intentionally modular: “Our architecture is a mix of agentic and expert models. One large model isn’t enough—you need specialists. That’s why we built ASI1, tuned specifically for agentic systems.”The interview also revealed new details about ASI:One’s personalization systems: the platform uses multiple user-owned knowledge graphs to store preferences, travel history, social connections, and contextual constraints. These knowledge graphs are siloed per user and not co-mingled with any Fetch-operated data. Sheikh described this as a “deterministic backbone” that gives the personal AI a stable memory layer beyond the probabilistic output of a single large model.ASI:One launches in Beta today, with a broader release planned for early 2026. Fetch also offers ASI:One Mobile, released earlier this year, giving users access to the same agent-orchestration capabilities on iOS and Android. The mobile app connects directly to Agentverse and the user’s knowledge graphs, enabling on-the-go task execution and real-time interaction with registered agents.<h2>Fetch Business — Verified Identity and Brand Control</h2>To enable reliable coordination between consumers and companies, Fetch is introducing a verification and discovery portal called Fetch Business. The platform allows organizations to verify their identity and claim an official Brand Agent handle—for example, @Hilton or @Nike—regardless of which tools they use to build the underlying agent.Fetch positions the product as an analogue to ICANN domain registration and SSL certificate systems for websites. Verified status is intended to protect consumers from interacting with counterfeit or untrusted agents, a problem the company describes as a major barrier to widespread agent adoption.The system includes low-code tools for small businesses to create agents in a few steps and connect real-time APIs such as inventory, booking systems, or CRM platforms. “With Fetch, you can create an agent in one minute. It gets a handle, like a Twitter username, and you can personalize it completely—even give it your social media permissions to post on your behalf,” Sheikh said. Once a brand claims its namespace, its agent becomes discoverable to consumer AIs and other agents inside Agentverse.The company has pre-reserved thousands of brand namespaces in anticipation of demand. Verification status persists across any platform that integrates with Agentverse, creating a portable identity layer for business agents.The interview highlighted that Fetch Business inherits web-trust primitives directly: domain owners verify their identity by inserting a short code snippet into their existing website backend, allowing the system to pass a cryptographic challenge and grant the agent an authenticity badge similar to a “blue check” for agent identities. Sheikh framed this as “reusing the trust layer the web already spent decades building.”Companies can begin claiming agents now at <a href="https://business.fetch.ai/">business.fetch.ai</a>.<h2>Agentverse — An Open Directory of More Than Two Million Agents</h2>The final component of the release is <a href="https://agentverse.ai/">Agentverse</a>, an open directory and cloud platform that hosts agents and enables cross-ecosystem discoverability. Fetch states that millions of agents have already registered, spanning travel, retail, entertainment, food service, and enterprise categories.Agentverse provides metadata, capability descriptions, and routing logic that ASI:One uses to identify appropriate agents for specific tasks. It also supports secure communication and data exchange between agents. The company notes that the directory is platform-agnostic: agents built with any framework can join and interoperate.According to Sheikh, the lack of a discovery layer is one reason most AI agents see little or no usage. “Ninety percent of AI agents never get used because there’s no discovery layer,” he said. He framed the role of Agentverse in more technical terms: “Right now, if you build an agent, there’s no universal way for others to discover it. That’s what AgentVerse solves—it’s like DNS for agents.” He also described the system as an essential component of the emerging agent economy: “Fetch is building the Google of agents. Just like websites needed search, agents need discovery, trust, and interaction—Fetch provides all of that.”The interview further underscored that Agentverse is cloud-agnostic by design. Sheikh contrasted this with competing agent ecosystems tied to specific cloud providers, arguing that a universal registry is only viable if independent of proprietary cloud environments. He said the open architecture enables an LLM to query any agent “within one minute of deployment,” turning agent publication into a near-instantaneous process similar to registering a domain.Agentverse also integrates payment pathways, enabling agents to execute purchases using partners such as Visa, Skyfire, and supported stablecoins. Consumers can configure spending limits or require explicit approval for transactions.<h2>Industry Context and Implications</h2>Fetch’s launch comes at a time when consumer AI platforms are exploring the shift from static chat interfaces toward autonomous agents capable of completing actions. However, most agent systems remain limited by siloed architectures, limited interoperability, and weak verification standards.Fetch positions its infrastructure as a response to these limitations by providing a cross-platform coordination layer, identity system, and directory service. The company argues that an agent ecosystem requires consistent verification mechanisms to ensure that consumers interact with authentic brand representatives rather than imitations. By establishing namespace control and portable trust indicators, Fetch Business aims to fill a gap similar to early web domain verification.At the same time, ASI:One attempts to centralize user preference data in a way that enables more efficient personalization and multi-agent coordination. This approach differs from generalist LLM applications, which often lack persistent preference architectures or direct access to brand-controlled agents.The interview also made clear that micropayments and digital transaction infrastructure are central to Fetch’s long-term vision. Sheikh referenced integrations with protocols such as Coinbase’s 402 and AP2, positioning these capabilities as essential for autonomous agents to complete end-to-end tasks that include financial execution.<h2>Takeaway</h2>Fetch’s combined release of ASI:One, Fetch Business, and Agentverse introduces an interconnected stack designed to support large-scale deployment and usage of AI agents. The company frames the system as foundational infrastructure for an agentic ecosystem, where consumer AIs can coordinate with verified brand agents to complete tasks reliably and securely. The additions to its identity, discovery, and orchestration layers reflect Fetch’s long-standing thesis—rooted partly in lessons from DeepMind’s early development—that intelligence becomes meaningful only when paired with the capacity to act.

أطلقت شركة Fetch AI، وهي شركة ناشئة يقودها هميون شيخ، ثلاثة منتجات مترابطة تهدف إلى تعزيز نظام وكلاء الذكاء الاصطناعي. تشمل العروض الجديدة ASI:One، وهي منصة تنسيق ذكاء اصطناعي شخصي، وFetch Business، وهو بوابة للتحقق والاكتشاف لوكلاء العلامات التجارية، وAgentverse، وهو دليل مفتوح يضم أكثر من مليوني وكيل. تسعى هذه المبادرة إلى إنشاء بنية تحتية قوية لما تصفه Fetch بـ 'الويب الوكالي.'

Fetch AI, una startup dirigida por Humayun Sheikh, ha lanzado tres productos interconectados destinados a mejorar el ecosistema de agentes de IA. Las nuevas ofertas incluyen ASI:One, una plataforma de orquestación de IA personal, Fetch Business, un portal de verificación y descubrimiento para agentes de marca, y Agentverse, un directorio abierto con más de dos millones de agentes. Esta iniciativa busca establecer una infraestructura sólida para lo que Fetch describe como la 'Web Agente.'

Fetch AI, une startup dirigée par Humayun Sheikh, a lancé trois produits interconnectés visant à améliorer l'écosystème des agents IA. Les nouvelles offres comprennent ASI:One, une plateforme d'orchestration d'IA personnelle, Fetch Business, un portail de vérification et de découverte pour les agents de marque, et Agentverse, un annuaire ouvert avec plus de deux millions d'agents. Cette initiative vise à établir une infrastructure robuste pour ce que Fetch décrit comme le 'Web Agentique.'

Fetch AI, a startup led by Humayun Sheikh, has launched three interconnected products aimed at enhancing the ecosystem of AI agents. The new offerings include ASI:One, a personal-AI orchestration platform, Fetch Business, a verification and discovery portal for brand agents, and Agentverse, an open directory with over two million agents. This initiative seeks to establish a robust infrastructure for what Fetch describes as the 'Agentic Web.'

The Google Search of AI agents? Fetch launches ASI:One and Business tier for new era of non-human web

If aesthetics and efficiency top your list of needs, there are several Linux distributions that are right up your alley. Both Ubuntu Budgie and Pop!_OS should top that list.

Ubuntu Budgie vs. Pop!_OS: I've used both Linux distros - here's how to choose

arXiv:2506.22481v2 Announce Type: replace-cross 
Abstract: In recent years, significant advancements in the field of Natural Language Processing (NLP) have positioned commercialized language models as wide-reaching, highly useful tools. In tandem, there has been an explosion of multidisciplinary research examining how NLP tasks reflect, perpetuate, and amplify social biases such as gender and racial bias. A significant gap in this scholarship is a detailed analysis of how queer sexualities are encoded and (mis)represented by both NLP systems and practitioners. Following previous work in the field of AI fairness, we document how sexuality is defined and operationalized via a survey and analysis of 55 articles that quantify sexuality-based NLP bias. We find that sexuality is not clearly defined in a majority of the literature surveyed, indicating a reliance on assumed or normative conceptions of sexual/romantic practices and identities. Further, we find that methods for extracting biased outputs from NLP technologies often conflate gender and sexual identities, leading to monolithic conceptions of queerness and thus improper quantifications of bias. With the goal of improving sexuality-based NLP bias analyses, we conclude with recommendations that encourage more thorough engagement with both queer communities and interdisciplinary literature.

أدت التطورات الأخيرة في معالجة اللغة الطبيعية (NLP) إلى استخدام واسع النطاق لنماذج اللغة، مما أثار أبحاثًا حول كيفية انعكاس وتعزيز التحيزات الاجتماعية، بما في ذلك التحيزات الجندرية والعرقية. ومع ذلك، هناك فجوة ملحوظة في تحليل كيفية تمثيل الهويات الجنسية غير التقليدية في أنظمة NLP. تكشف دراسة شملت 55 مقالًا أن مفهوم الجنسية غالبًا ما يكون غير محدد بوضوح، مما يعتمد على افتراضات معيارية حول الهويات والممارسات الجنسية والرومانسية، مما يثير مخاوف بشأن كيفية تشغيل مفهوم الجنسية في أبحاث التحيز في NLP.

Los avances recientes en el procesamiento del lenguaje natural (NLP) han llevado a un uso generalizado de modelos de lenguaje, lo que ha provocado investigaciones sobre cómo se reflejan y amplifican los sesgos sociales, incluidos los sesgos de género y raciales. Sin embargo, existe una notable brecha en el análisis de cómo se representan las sexualidades queer en los sistemas de NLP. Una encuesta de 55 artículos revela que la sexualidad a menudo está mal definida, dependiendo de suposiciones normativas sobre las identidades sexuales y románticas, lo que plantea preocupaciones sobre la operacio…

Les avancées récentes en traitement du langage naturel (NLP) ont conduit à une utilisation généralisée des modèles linguistiques, suscitant des recherches sur la réflexion et l'amplification des biais sociaux, y compris les biais de genre et raciaux. Cependant, il existe un écart notable dans l'analyse de la représentation des sexualités queer dans les systèmes NLP. Une enquête sur 55 articles révèle que la sexualité est souvent mal définie, reposant sur des hypothèses normatives concernant les identités sexuelles et romantiques, ce qui soulève des préoccupations quant à l'opérationnalisation …

Recent advancements in Natural Language Processing (NLP) have led to the widespread use of language models, prompting research into the reflection and amplification of social biases, including gender and racial bias. However, there is a notable gap in the analysis of how queer sexualities are represented in NLP systems. A survey of 55 articles reveals that sexuality is often poorly defined, relying on normative assumptions about sexual and romantic identities, which raises concerns about the operationalization of sexuality in NLP bias research.

Theories of "Sexuality" in Natural Language Processing Bias Research

arXiv:2503.11858v3 Announce Type: replace 
Abstract: Large Language Models (LLMs) have demonstrated great potential as evaluators of NLG systems, allowing for high-quality, reference-free, and multi-aspect assessments. However, existing LLM-based metrics suffer from two major drawbacks: reliance on proprietary models to generate training data or perform evaluations, and a lack of fine-grained, explanatory feedback. In this paper, we introduce OpeNLGauge, a fully open-source, reference-free NLG evaluation metric that provides accurate explanations based on error spans. OpeNLGauge is available as a two-stage ensemble of larger open-weight LLMs, or as a small fine-tuned evaluation model, with confirmed generalizability to unseen tasks, domains and aspects. Our extensive meta-evaluation shows that OpeNLGauge achieves competitive correlation with human judgments, outperforming state-of-the-art models on certain tasks while maintaining full reproducibility and providing explanations more than twice as accurate.

OpeNLGauge هي مقياس مفتوح المصدر جديد لتقييم أنظمة توليد اللغة الطبيعية (NLG) باستخدام نماذج اللغة الكبيرة (LLMs). على عكس المقاييس الحالية التي تعتمد على نماذج ملكية، يوفر OpeNLGauge تقييمات بدون مرجع ويقدم تفسيرات دقيقة تعتمد على نطاقات الأخطاء. تم تصميمه ليكون قابلاً للتكيف مع مهام ومجالات متنوعة، حيث يظهر ارتباطًا تنافسيًا مع أحكام البشر ويتفوق على بعض النماذج المتطورة مع ضمان القابلية للتكرار.

OpeNLGauge es una nueva métrica de código abierto para la evaluación de sistemas de Generación de Lenguaje Natural (NLG) que utiliza Modelos de Lenguaje de Gran Tamaño (LLMs). A diferencia de las métricas existentes que dependen de modelos propietarios, OpeNLGauge ofrece evaluaciones sin referencia y proporciona explicaciones detalladas basadas en rangos de error. Está diseñada para ser adaptable a diversas tareas y dominios, mostrando una correlación competitiva con los juicios humanos y superando a algunos modelos de última generación, garantizando la reproducibilidad.

OpeNLGauge est une nouvelle métrique open-source pour l'évaluation des systèmes de génération de langage naturel (NLG) utilisant des modèles de langage de grande taille (LLM). Contrairement aux métriques existantes qui dépendent de modèles propriétaires, OpeNLGauge propose des évaluations sans référence et fournit des explications détaillées basées sur des plages d'erreurs. Elle est conçue pour être adaptable à diverses tâches et domaines, montrant une corrélation compétitive avec les jugements humains et surpassant certains modèles à la pointe de la technologie tout en garantissant la reprodu…

OpeNLGauge is a newly introduced open-source metric for evaluating Natural Language Generation (NLG) systems using Large Language Models (LLMs). Unlike existing metrics that depend on proprietary models, OpeNLGauge offers reference-free evaluations and provides detailed explanations based on error spans. It is designed to be adaptable to various tasks and domains, demonstrating competitive correlation with human judgments and outperforming some state-of-the-art models while ensuring reproducibility.

OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs

arXiv:2511.14112v1 Announce Type: new 
Abstract: Automatic ICD coding from clinical text is a critical task in medical NLP but remains hindered by the extreme long-tail distribution of diagnostic codes. Thousands of rare and zero-shot ICD codes are severely underrepresented in datasets like MIMIC-III, leading to low macro-F1 scores. In this work, we propose a data-centric framework that generates high-quality synthetic discharge summaries to mitigate this imbalance. Our method constructs realistic multi-label code sets anchored on rare codes by leveraging real-world co-occurrence patterns, ICD descriptions, synonyms, taxonomy, and similar clinical notes. Using these structured prompts, we generate 90,000 synthetic notes covering 7,902 ICD codes, significantly expanding the training distribution. We fine-tune two state-of-the-art transformer-based models, PLM-ICD and GKI-ICD, on both the original and extended datasets. Experiments show that our approach modestly improves macro-F1 while maintaining strong micro-F1, outperforming prior SOTA. While the gain may seem marginal relative to the computational cost, our results demonstrate that carefully crafted synthetic data can enhance equity in long-tail ICD code prediction.

يعد الترميز التلقائي لرموز ICD من النصوص السريرية أمرًا ضروريًا في معالجة اللغة الطبيعية الطبية، ولكنه يواجه تحديات بسبب توزيع الرموز التشخيصية الطويلة. العديد من رموز ICD النادرة ممثلة تمثيلًا ناقصًا في مجموعات البيانات مثل MIMIC-III، مما يؤدي إلى انخفاض درجات macro-F1. يقدم هذا العمل إطارًا مركزيًا للبيانات يولد ملخصات خروج اصطناعية عالية الجودة للتخفيف من هذا الخلل. باستخدام أنماط التواجد الواقعية وموارد أخرى، يتم إنتاج 90,000 ملاحظة اصطناعية تغطي 7,902 رمز ICD، مما يزيد بشكل كبير من توزيع التدريب. يُظهر ضبط النموذجين PLM-ICD وGKI-ICD على هذه المجموعات من البيانات تحسينات متواضعة في درجات m…

El codificación automática de ICD a partir de textos clínicos es esencial en el procesamiento del lenguaje natural médico, pero enfrenta desafíos debido a la distribución de larga cola de los códigos diagnósticos. Muchos códigos ICD raros están subrepresentados en conjuntos de datos como MIMIC-III, lo que resulta en bajos puntajes macro-F1. Este trabajo presenta un marco centrado en los datos que genera resúmenes de alta calidad para mitigar este desequilibrio. Utilizando patrones de co-ocurrencia del mundo real y otros recursos, se generan 90,000 notas sintéticas que cubren 7,902 códigos ICD,…

Le codage automatique des ICD à partir de textes cliniques est essentiel en NLP médical, mais il est confronté à des défis en raison de la distribution longue traîne des codes diagnostiques. De nombreux codes ICD rares sont sous-représentés dans des ensembles de données comme MIMIC-III, entraînant de faibles scores macro-F1. Ce travail propose un cadre centré sur les données qui génère des résumés de sortie synthétiques pour remédier à ce problème. En utilisant des modèles de co-occurrence du monde réel et d'autres ressources, la méthode produit 90 000 notes synthétiques pour 7 902 codes ICD, …

Automatic ICD coding from clinical text is essential in medical NLP but faces challenges due to the long-tail distribution of diagnostic codes. Many rare ICD codes are underrepresented in datasets like MIMIC-III, resulting in low macro-F1 scores. This work introduces a data-centric framework that generates synthetic discharge summaries to address this issue. By utilizing real-world co-occurrence patterns and other resources, the method produces 90,000 synthetic notes for 7,902 ICD codes, enhancing the training distribution. Fine-tuning of PLM-ICD and GKI-ICD models on these datasets shows mode…

Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning

Was this article worth reading? Share it