FIBER: A Multilingual Evaluation Resource for Factual Inference Bias

arXiv — cs.CL•Monday, December 15, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

FIBER, a new multilingual benchmark, has been introduced to evaluate factual knowledge and inference bias in large language models across English, Italian, and Turkish. This dataset includes tasks such as sentence completion and question-answering, aiming to assess how prompt language affects entity selection and model performance in single- and multi-entity contexts.
The development of FIBER is significant as it addresses the growing concerns regarding the factual reliability and biases of large language models, providing a systematic approach to evaluate these aspects in a multilingual setting.
This initiative reflects a broader trend in AI research focusing on the evaluation of language models across diverse languages and contexts, highlighting the importance of addressing biases and enhancing the factual accuracy of AI systems, which is crucial for their deployment in real-world applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

finlight.me

Realtime financial and market news API with sentiment analysis and full articles.

Business & ProductivityView app details

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

PrettyPolly

Practice any language with an AI partner and track your fluency progress.

Lifestyle & HealthView app details

Bytefitz

Analyze and optimize your content with AI-driven insights and performance metrics.

AI & DataView app details

Kansei

Practice and improve your language skills with personalized AI conversations.

AI & DataView app details

AIPortalX

Browse, compare, and use over 100 verified AI models with detailed insights and filtering.

Creative & DesignView app details

Continue Readings

arXiv — cs.LG3 days ago

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

PositiveArtificial Intelligence

The NeurIPS CURE-Bench Competition has highlighted the capabilities of TxAgent, an AI system designed for therapeutic decision-making in clinical medicine. Utilizing a fine-tuned Llama-3.1-8B model, TxAgent integrates various biomedical resources, including the FDA Drug API and OpenTargets, to enhance drug recommendations and treatment planning through iterative retrieval-augmented generation.

Read full article

via arXiv — cs.LG

arXiv — cs.LG3 days ago

Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs

NeutralArtificial Intelligence

A recent study on Large Language Models (LLMs) reveals that position bias, which affects how information is weighted based on its context location, varies significantly across different languages and model architectures. The research analyzed five languages—English, Russian, German, Hindi, and Vietnamese—using models like Qwen2.5-7B-Instruct and Mistral 7B, finding that late positions are favored in certain models contrary to the common early-token preference assumption.

Read full article

via arXiv — cs.LG

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about