arXiv:2509.15695v2 Announce Type: replace-cross 
Abstract: Large Vision-Language Models (LVLMs) excel at captioning, visual question answering, and robotics by combining vision and language, yet they often miss obvious objects or hallucinate nonexistent ones in atypical scenes. We examine these failures through the lens of uncertainty, focusing on contextual incongruity, where objects appear unexpectedly or fail to appear in expected contexts, and show that such cases increase recognition difficulty for state-of-the-art LVLMs. To study this regime, we introduce the Object Recognition in Incongruous Context (ORIC) framework, which constructs incongruous object-context pairs through two complementary strategies: (1) LLM-guided sampling to identify hard-to-recognize objects present in the image and (2) CLIP-guided sampling to mine plausible but absent ones. Applied to MSCOCO, ORIC produces ORIC-Bench and ORIC-style training data. Evaluating 18 LVLMs and 2 open-vocabulary detectors reveals substantial performance drops and bias patterns under incongruous contexts. Fine-tuning Qwen3-VL-8B-Instruct with Visual Reinforcement Fine-Tuning on 600 ORIC-style samples improves results on ORIC-Bench, AMBER, and HallusionBench. Overall, we show that contextual incongruity is a key source of uncertainty and provide tools for more reliable LVLMs. The code is available at https://github.com/ZhaoyangLi-1/ORIC.

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

<A HREF="https://fortune.com/2025/11/19/exclusive-doppel-raises-70-million-series-c-at-more-than-600-million-valuation-to-fight-ai-powered-social-engineering-attacks/"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251119/i39.jpg"></A>
<A HREF="http://www.techmeme.com/251119/p39#a251119p39" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Allie Garfinkle / <A HREF="http://www.fortune.com/">Fortune</A>: 
<A HREF="https://fortune.com/2025/11/19/exclusive-doppel-raises-70-million-series-c-at-more-than-600-million-valuation-to-fight-ai-powered-social-engineering-attacks/">Doppel, which makes an AI social engineering detection service, raised a $70M Series C led by Bessemer at a $600M+ valuation, up from $205M in May</A>&nbsp; &mdash;&nbsp; Senior Finance Reporter And Author Of Term Sheet&nbsp; &mdash;&nbsp; I tried to fool my brother, sort of.&nbsp; &mdash;&nbsp; Next to him and his Pekingese on the couch &hellip;

دوبل، وهي شركة متخصصة في خدمات الكشف عن الهندسة الاجتماعية باستخدام الذكاء الاصطناعي، نجحت في جمع 70 مليون دولار في جولة تمويل من السلسلة C بقيادة بيسمر. وقد زادت هذه الجولة من تقييم الشركة إلى أكثر من 600 مليون دولار، ارتفاعًا من 205 مليون دولار في مايو.

Doppel, una empresa que se especializa en servicios de detección de ingeniería social mediante IA, ha recaudado con éxito 70 millones de dólares en una ronda de financiación Serie C liderada por Bessemer. Esta ronda ha elevado la valoración de la empresa a más de 600 millones de dólares, un aumento significativo desde los 205 millones en mayo.

Doppel, une entreprise spécialisée dans les services de détection d'ingénierie sociale par IA, a levé avec succès 70 millions de dollars lors d'un tour de financement de série C dirigé par Bessemer. Ce tour de financement a porté la valorisation de l'entreprise à plus de 600 millions de dollars, contre 205 millions en mai.

Doppel, a company specializing in AI social engineering detection services, has successfully raised $70 million in a Series C funding round led by Bessemer. This funding round has increased the company's valuation to over $600 million, a significant rise from $205 million in May.

Doppel, which makes an AI social engineering detection service, raised a $70M Series C led by Bessemer at a $600M+ valuation, up from $205M in May (Allie Garfinkle/Fortune)

The service is customized for teachers' needs and includes added security and privacy, a collaborative workspace, and more.

OpenAI expands free educational offerings - here's what ChatGPT for Teachers can do

GPT-5.1-Codex-Max is ready to take on your next massive coding job. Here's what's new.

OpenAI's Codex Max solves one of my biggest AI coding annoyances - and adds dramatically faster performance

The agent offers one-click buying for all your holiday needs and will be free for all US-based users.

Perplexity's AI shopping tool is free for all now, just in time for Black Friday - how to use it

<a href="https://fetch.ai/">Fetch AI</a>, a startup founded and led by former DeepMind founding investor, Humayun Sheikh, <a href="https://www.businesswire.com/news/home/20251119088395/en/Fetch-Combines-Personalized-AI-with-Multi-Agent-Collaboration-to-Handle-Complex-Consumer-Tasks-Launches-Claim-Your-Agent-to-Fight-Brand-Knock-Offs">today announced the release</a> of three interconnected products designed to provide the trust, coordination, and interoperability needed for large-scale AI agent ecosystems. The launch includes <a href="https://asi1.ai/">ASI:One</a>, a personal-AI orchestration platform; <a href="https://business.fetch.ai/">Fetch Business</a>, a verification and discovery portal for brand agents; and <a href="https://agentverse.ai/?sort=relevancy&amp;page=1&amp;recommended=true">Agentverse</a>, an open directory hosting more than two million agents. Together, the system positions Fetch as an infrastructure provider for what it calls the “Agentic Web”—a layer where consumer AIs and brand AIs collaborate to complete tasks instead of merely suggesting them.The company says the tools address a central limitation in current consumer AI: models can provide recommendations but cannot reliably execute multi-step actions that require coordination across businesses. Fetch’s approach centers on enabling agents from different organizations to interoperate securely, using verified identities and shared context to complete end-to-end workflows.“We’re creating the same foundation for agents that Google created for websites,” said Humayun Sheikh, Founder and CEO of Fetch AI, and an early investor in DeepMind, in a press release provided to VentureBeat. “Instead of just finding information, your personal AI coordinates with verified brand agents to get things done.”<h2>Background: Fetch’s Founding and DeepMind Connection </h2>Fetch AI was founded in 2017 by Humayun Sheikh, an entrepreneur whose early investment in DeepMind helped support the company’s commercial development before its acquisition by Google. “I was one of the first five people at DeepMind and its first investor. My check was the first one in,” Sheikh said, reflecting on the period when advanced machine learning research was still largely inaccessible outside major technology companies.His early experience helped shape Fetch’s direction. “Even in 2013, it was clear to me that agentic systems were going to be the ones that worked. That’s where I focused—on the agentic web,” Sheikh noted. Fetch built on this thesis by developing infrastructure for autonomous software agents, focusing on verifiable identity, secure data exchange, and multi-agent coordination. Over the past several years, the company has expanded to a 70-person team across Cambridge and Menlo Park, raised approximately $60 million, and accumulated more than one million users interacting with its model—data that informed the design of the newly launched products.Sheikh added that his decision to bootstrap the company initially came directly from the proceeds of the DeepMind exit, noting in the interview that while the sale to Google was “a good exit,” he believed the team could have held out for a higher valuation. The early self-funding period allowed Fetch to begin work in 2015—well before transformer architectures went mainstream—on the hypothesis that agentic infrastructure would become foundational to applied AI.<h2>ASI:One — A Platform for Multi-Agent Orchestration</h2>At the core of the launch is ASI:One, a language model interface designed specifically for coordinating multiple agents rather than addressing isolated queries. Fetch describes it as an “intelligence layer” that handles context sharing, task routing, and preference modeling.The system stores user-level signals such as favored airlines, dietary constraints, budget ranges, loyalty program identifiers, and calendar availability. When a user requests a complex task—such as planning a trip with flights, hotels, and restaurant reservations—ASI:One retrieves those preferences and delegates work to the appropriate verified agents. The agents then return actionable outputs, including inventory and booking options, rather than generic recommendations.In practice, ASI:One functions as a workflow generator across organizational boundaries. By contrast with conventional LLM applications, which often rely on APIs or RAG techniques to surface information, ASI:One is built to coordinate autonomous agents that can complete transactions. Fetch notes that personalization improves over time as the model accumulates structured preference data.Sheikh emphasized the distinction between orchestrated execution and traditional AI output. “This isn’t searching for options separately and hoping they work together,” he said. “It’s orchestration.” He added that Fetch’s architecture is intentionally modular: “Our architecture is a mix of agentic and expert models. One large model isn’t enough—you need specialists. That’s why we built ASI1, tuned specifically for agentic systems.”The interview also revealed new details about ASI:One’s personalization systems: the platform uses multiple user-owned knowledge graphs to store preferences, travel history, social connections, and contextual constraints. These knowledge graphs are siloed per user and not co-mingled with any Fetch-operated data. Sheikh described this as a “deterministic backbone” that gives the personal AI a stable memory layer beyond the probabilistic output of a single large model.ASI:One launches in Beta today, with a broader release planned for early 2026. Fetch also offers ASI:One Mobile, released earlier this year, giving users access to the same agent-orchestration capabilities on iOS and Android. The mobile app connects directly to Agentverse and the user’s knowledge graphs, enabling on-the-go task execution and real-time interaction with registered agents.<h2>Fetch Business — Verified Identity and Brand Control</h2>To enable reliable coordination between consumers and companies, Fetch is introducing a verification and discovery portal called Fetch Business. The platform allows organizations to verify their identity and claim an official Brand Agent handle—for example, @Hilton or @Nike—regardless of which tools they use to build the underlying agent.Fetch positions the product as an analogue to ICANN domain registration and SSL certificate systems for websites. Verified status is intended to protect consumers from interacting with counterfeit or untrusted agents, a problem the company describes as a major barrier to widespread agent adoption.The system includes low-code tools for small businesses to create agents in a few steps and connect real-time APIs such as inventory, booking systems, or CRM platforms. “With Fetch, you can create an agent in one minute. It gets a handle, like a Twitter username, and you can personalize it completely—even give it your social media permissions to post on your behalf,” Sheikh said. Once a brand claims its namespace, its agent becomes discoverable to consumer AIs and other agents inside Agentverse.The company has pre-reserved thousands of brand namespaces in anticipation of demand. Verification status persists across any platform that integrates with Agentverse, creating a portable identity layer for business agents.The interview highlighted that Fetch Business inherits web-trust primitives directly: domain owners verify their identity by inserting a short code snippet into their existing website backend, allowing the system to pass a cryptographic challenge and grant the agent an authenticity badge similar to a “blue check” for agent identities. Sheikh framed this as “reusing the trust layer the web already spent decades building.”Companies can begin claiming agents now at <a href="https://business.fetch.ai/">business.fetch.ai</a>.<h2>Agentverse — An Open Directory of More Than Two Million Agents</h2>The final component of the release is <a href="https://agentverse.ai/">Agentverse</a>, an open directory and cloud platform that hosts agents and enables cross-ecosystem discoverability. Fetch states that millions of agents have already registered, spanning travel, retail, entertainment, food service, and enterprise categories.Agentverse provides metadata, capability descriptions, and routing logic that ASI:One uses to identify appropriate agents for specific tasks. It also supports secure communication and data exchange between agents. The company notes that the directory is platform-agnostic: agents built with any framework can join and interoperate.According to Sheikh, the lack of a discovery layer is one reason most AI agents see little or no usage. “Ninety percent of AI agents never get used because there’s no discovery layer,” he said. He framed the role of Agentverse in more technical terms: “Right now, if you build an agent, there’s no universal way for others to discover it. That’s what AgentVerse solves—it’s like DNS for agents.” He also described the system as an essential component of the emerging agent economy: “Fetch is building the Google of agents. Just like websites needed search, agents need discovery, trust, and interaction—Fetch provides all of that.”The interview further underscored that Agentverse is cloud-agnostic by design. Sheikh contrasted this with competing agent ecosystems tied to specific cloud providers, arguing that a universal registry is only viable if independent of proprietary cloud environments. He said the open architecture enables an LLM to query any agent “within one minute of deployment,” turning agent publication into a near-instantaneous process similar to registering a domain.Agentverse also integrates payment pathways, enabling agents to execute purchases using partners such as Visa, Skyfire, and supported stablecoins. Consumers can configure spending limits or require explicit approval for transactions.<h2>Industry Context and Implications</h2>Fetch’s launch comes at a time when consumer AI platforms are exploring the shift from static chat interfaces toward autonomous agents capable of completing actions. However, most agent systems remain limited by siloed architectures, limited interoperability, and weak verification standards.Fetch positions its infrastructure as a response to these limitations by providing a cross-platform coordination layer, identity system, and directory service. The company argues that an agent ecosystem requires consistent verification mechanisms to ensure that consumers interact with authentic brand representatives rather than imitations. By establishing namespace control and portable trust indicators, Fetch Business aims to fill a gap similar to early web domain verification.At the same time, ASI:One attempts to centralize user preference data in a way that enables more efficient personalization and multi-agent coordination. This approach differs from generalist LLM applications, which often lack persistent preference architectures or direct access to brand-controlled agents.The interview also made clear that micropayments and digital transaction infrastructure are central to Fetch’s long-term vision. Sheikh referenced integrations with protocols such as Coinbase’s 402 and AP2, positioning these capabilities as essential for autonomous agents to complete end-to-end tasks that include financial execution.<h2>Takeaway</h2>Fetch’s combined release of ASI:One, Fetch Business, and Agentverse introduces an interconnected stack designed to support large-scale deployment and usage of AI agents. The company frames the system as foundational infrastructure for an agentic ecosystem, where consumer AIs can coordinate with verified brand agents to complete tasks reliably and securely. The additions to its identity, discovery, and orchestration layers reflect Fetch’s long-standing thesis—rooted partly in lessons from DeepMind’s early development—that intelligence becomes meaningful only when paired with the capacity to act.

أطلقت شركة Fetch AI، وهي شركة ناشئة يقودها هميون شيخ، ثلاثة منتجات مترابطة تهدف إلى تعزيز نظام وكلاء الذكاء الاصطناعي. تشمل العروض الجديدة ASI:One، وهي منصة تنسيق ذكاء اصطناعي شخصي، وFetch Business، وهو بوابة للتحقق والاكتشاف لوكلاء العلامات التجارية، وAgentverse، وهو دليل مفتوح يضم أكثر من مليوني وكيل. تسعى هذه المبادرة إلى إنشاء بنية تحتية قوية لما تصفه Fetch بـ 'الويب الوكالي.'

Fetch AI, una startup dirigida por Humayun Sheikh, ha lanzado tres productos interconectados destinados a mejorar el ecosistema de agentes de IA. Las nuevas ofertas incluyen ASI:One, una plataforma de orquestación de IA personal, Fetch Business, un portal de verificación y descubrimiento para agentes de marca, y Agentverse, un directorio abierto con más de dos millones de agentes. Esta iniciativa busca establecer una infraestructura sólida para lo que Fetch describe como la 'Web Agente.'

Fetch AI, une startup dirigée par Humayun Sheikh, a lancé trois produits interconnectés visant à améliorer l'écosystème des agents IA. Les nouvelles offres comprennent ASI:One, une plateforme d'orchestration d'IA personnelle, Fetch Business, un portail de vérification et de découverte pour les agents de marque, et Agentverse, un annuaire ouvert avec plus de deux millions d'agents. Cette initiative vise à établir une infrastructure robuste pour ce que Fetch décrit comme le 'Web Agentique.'

Fetch AI, a startup led by Humayun Sheikh, has launched three interconnected products aimed at enhancing the ecosystem of AI agents. The new offerings include ASI:One, a personal-AI orchestration platform, Fetch Business, a verification and discovery portal for brand agents, and Agentverse, an open directory with over two million agents. This initiative seeks to establish a robust infrastructure for what Fetch describes as the 'Agentic Web.'

The Google Search of AI agents? Fetch launches ASI:One and Business tier for new era of non-human web

If aesthetics and efficiency top your list of needs, there are several Linux distributions that are right up your alley. Both Ubuntu Budgie and Pop!_OS should top that list.

Ubuntu Budgie vs. Pop!_OS: I've used both Linux distros - here's how to choose

arXiv:2511.14268v1 Announce Type: cross 
Abstract: Heterogeneous porous materials play a crucial role in various engineering systems. Microstructure characterization and reconstruction provide effective means for modeling these materials, which are critical for conducting physical property simulations, structure-property linkage studies, and enhancing their performance across different applications. To achieve superior controllability and applicability with small sample sizes, we propose a statistically controllable microstructure reconstruction framework that integrates neural networks with sliced-Wasserstein metric. Specifically, our approach leverages local pattern distribution for microstructure characterization and employs a controlled sampling strategy to generate target distributions that satisfy given conditional parameters. A neural network-based model establishes the mapping from the input distribution to the target local pattern distribution, enabling microstructure reconstruction. Combinations of sliced-Wasserstein metric and gradient optimization techniques minimize the distance between these distributions, leading to a stable and reliable model. Our method can perform stochastic and controllable reconstruction tasks even with small sample sizes. Additionally, it can generate large-size (e.g. 512 and 1024) 3D microstructures using a chunking strategy. By introducing spatial location masks, our method excels at generating spatially heterogeneous and complex microstructures. We conducted experiments on stochastic reconstruction, controllable reconstruction, heterogeneous reconstruction, and large-size microstructure reconstruction across various materials. Comparative analysis through visualization, statistical measures, and physical property simulations demonstrates the effectiveness, providing new insights and possibilities for research on structure-property linkage and material inverse design.

تم اقتراح إطار عمل جديد لإعادة بناء الميكروستركشر للمواد المسامية غير المتجانسة، حيث يتم دمج الشبكات العصبية مع مقياس ووترستين المقطوع. تعزز هذه الطريقة من توصيف وإعادة بناء الميكروستركشر، وهما أمران أساسيان لنمذجة هذه المواد في التطبيقات الهندسية. من خلال استخدام توزيع الأنماط المحلية واستراتيجية أخذ عينات محكومة، يهدف الإطار إلى تحسين القابلية للتحكم والتطبيق في إعادة بناء الميكروستركشر، حتى مع أحجام عينات صغيرة.

Se ha propuesto un nuevo marco para la reconstrucción de la microestructura de materiales heterogéneos porosos, integrando redes neuronales con la métrica de Wasserstein cortada. Este enfoque mejora la caracterización y reconstrucción de la microestructura, que son esenciales para modelar materiales en aplicaciones de ingeniería. Al utilizar la distribución de patrones locales y una estrategia de muestreo controlado, el marco busca mejorar la controlabilidad y aplicabilidad de la reconstrucción de microestructuras, incluso con tamaños de muestra pequeños.

Un nouveau cadre pour la reconstruction de la microstructure des matériaux hétérogènes poreux a été proposé, intégrant des réseaux de neurones avec la métrique de Wasserstein tranchée. Cette approche améliore la caractérisation et la reconstruction de la microstructure, essentielles pour modéliser les matériaux dans les applications d'ingénierie. En utilisant la distribution des motifs locaux et une stratégie d'échantillonnage contrôlé, le cadre vise à améliorer la contrôlabilité et l'applicabilité de la reconstruction de la microstructure, même avec de petites tailles d'échantillons.

A new framework for reconstructing the microstructure of heterogeneous porous materials has been proposed, integrating neural networks with the sliced-Wasserstein metric. This approach enhances microstructure characterization and reconstruction, which are essential for modeling materials in engineering applications. By utilizing local pattern distribution and a controlled sampling strategy, the framework aims to improve the controllability and applicability of microstructure reconstruction, even with small sample sizes.

Statistically controllable microstructure reconstruction framework for heterogeneous materials using sliced-Wasserstein metric and neural networks

arXiv:2408.00540v4 Announce Type: replace-cross 
Abstract: Artificial Intelligence (AI) is being incorporated in several optimization, scheduling, orchestration as well as in native communication network functions. This paradigm shift results in increased energy consumption, however, quantifying the end-to-end energy consumption of adding intelligence to communication systems remains an open challenge since conventional energy consumption metrics focus on either communication, computation infrastructure, or model development. To address this, we propose a new metric, the Energy Cost of AI Lifecycle (eCAL) of an AI model in a system. eCAL captures the energy consumption throughout the development, deployment and utilization of an AI-model providing intelligence in a communication network by (i) analyzing the complexity of data collection and manipulation in individual components and (ii) deriving overall and per-bit energy consumption. We show that as a trained AI model is used more frequently for inference, its energy cost per inference decreases, since the fixed training energy is amortized over a growing number of inferences. For a simple case study we show that eCAL for 100 inferences is 2.73 times higher than for 1000 inferences. Additionally, we have developed a modular and extendable open-source simulation tool to enable researchers, practitioners, and engineers to calculate the end-to-end energy cost with various configurations and across various systems, ensuring adaptability to diverse use cases.

يتناول المقال دمج الذكاء الاصطناعي (AI) في شبكات الاتصال، مشيرًا إلى زيادة استهلاك الطاقة المرتبطة بهذا التحول. يقدم مقياسًا جديدًا يسمى تكلفة الطاقة لدورة حياة الذكاء الاصطناعي (eCAL)، والذي يقيس الطاقة المستخدمة خلال تطوير ونشر واستخدام نماذج الذكاء الاصطناعي في أنظمة الاتصال. تؤكد الدراسة على الحاجة إلى فهم شامل لمقاييس استهلاك الطاقة، التي تركز تقليديًا على الاتصال أو بنية الحوسبة أو تطوير النماذج.

El artículo aborda la integración de la inteligencia artificial (IA) en las redes de comunicación, destacando el aumento del consumo de energía asociado con este cambio. Presenta una nueva métrica llamada Costo Energético del Ciclo de Vida de la IA (eCAL), que cuantifica la energía utilizada durante el desarrollo, implementación y utilización de modelos de IA en sistemas de comunicación. El estudio enfatiza la necesidad de una comprensión integral de las métricas de consumo de energía, que tradicionalmente se centran en la comunicación, infraestructura de computación o desarrollo de modelos.

L'article traite de l'intégration de l'intelligence artificielle (IA) dans les réseaux de communication, soulignant l'augmentation de la consommation d'énergie associée à ce changement. Il présente un nouveau métrique appelé le Coût Énergétique du Cycle de Vie de l'IA (eCAL), qui quantifie l'énergie utilisée lors du développement, du déploiement et de l'utilisation des modèles d'IA dans les systèmes de communication. L'étude met en avant la nécessité d'une compréhension globale des métriques de consommation d'énergie, qui se concentrent traditionnellement sur la communication, l'infrastructure…

The article discusses the integration of Artificial Intelligence (AI) into communication networks, highlighting the increased energy consumption associated with this shift. It presents a new metric called the Energy Cost of AI Lifecycle (eCAL), which quantifies the energy used during the development, deployment, and utilization of AI models in communication systems. The study emphasizes the need for a comprehensive understanding of energy consumption metrics, which traditionally focus on communication, computation infrastructure, or model development.

The Energy Cost of Artificial Intelligence Lifecycle in Communication Networks

arXiv:2511.14465v1 Announce Type: new 
Abstract: Mechanistic interpretability research requires reliable tools for analyzing transformer internals across diverse architectures. Current approaches face a fundamental tradeoff: custom implementations like TransformerLens ensure consistent interfaces but require coding a manual adaptation for each architecture, introducing numerical mismatch with the original models, while direct HuggingFace access through NNsight preserves exact behavior but lacks standardization across models. To bridge this gap, we develop nnterp, a lightweight wrapper around NNsight that provides a unified interface for transformer analysis while preserving original HuggingFace implementations. Through automatic module renaming and comprehensive validation testing, nnterp enables researchers to write intervention code once and deploy it across 50+ model variants spanning 16 architecture families. The library includes built-in implementations of common interpretability methods (logit lens, patchscope, activation steering) and provides direct access to attention probabilities for models that support it. By packaging validation tests with the library, researchers can verify compatibility with custom models locally. nnterp bridges the gap between correctness and usability in mechanistic interpretability tooling.

يتناول المقال nnterp، وهي أداة جديدة مصممة لتعزيز البحث في التفسير الميكانيكي لنماذج المحولات. تواجه الأساليب الحالية تحديات في التوحيد والدقة العددية عند تحليل هياكل مختلفة. تعمل nnterp كغلاف خفيف حول NNsight، مما يوفر واجهة موحدة لتحليل المحولات مع الحفاظ على تنفيذات HuggingFace الأصلية. تتيح هذه الأداة للباحثين كتابة كود التدخل مرة واحدة وتطبيقه عبر أكثر من 50 نموذجًا متنوعًا من 16 عائلة معمارية، مما يسهل الاختبارات الشاملة للتفسير.

El artículo presenta nnterp, una nueva herramienta diseñada para mejorar la investigación sobre la interpretabilidad mecanicista de los modelos de transformadores. Los métodos actuales enfrentan desafíos en la estandarización y precisión numérica al analizar diferentes arquitecturas. nnterp actúa como un envoltorio ligero alrededor de NNsight, proporcionando una interfaz unificada para el análisis de transformadores mientras mantiene las implementaciones originales de HuggingFace. Permite a los investigadores escribir código de intervención una vez y aplicarlo a más de 50 variantes de modelos …

L'article présente nnterp, un nouvel outil conçu pour améliorer la recherche sur l'interprétabilité mécaniste des modèles de transformateurs. Les méthodes actuelles rencontrent des défis en matière de standardisation et de précision numérique lors de l'analyse de différentes architectures. nnterp agit comme un wrapper léger autour de NNsight, offrant une interface unifiée pour l'analyse des transformateurs tout en maintenant les implémentations originales de HuggingFace. Il permet aux chercheurs d'écrire un code d'intervention une fois et de l'appliquer à plus de 50 variantes de modèles proven…

The article discusses nnterp, a new tool designed to enhance mechanistic interpretability research for transformer models. Current methods face challenges in standardization and numerical accuracy when analyzing different architectures. nnterp serves as a lightweight wrapper around NNsight, providing a unified interface for transformer analysis while maintaining the original HuggingFace implementations. It allows researchers to write intervention code once and apply it across over 50 model variants from 16 architecture families, facilitating comprehensive interpretability testing.

ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models

Was this article worth reading? Share it