Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment

Nature — Machine Learning•Monday, December 1, 2025 at 12:00:00 AM

NeutralArtificial Intelligence

A recent study published in Nature — Machine Learning found that human researchers outperformed large language models in writing a medical systematic review during a comparative multitask assessment. This research highlights the limitations of current AI capabilities in complex academic writing tasks, particularly in the medical field.
The findings underscore the importance of human expertise in producing high-quality systematic reviews, which are critical for evidence-based medicine. This study may influence how medical research is conducted and evaluated, particularly in the integration of AI tools.
The results reflect ongoing discussions about the role of AI in academia and healthcare, emphasizing the need for improved evaluation methods for large language models. As AI continues to evolve, the balance between human insight and machine efficiency remains a pivotal topic, particularly in fields requiring nuanced understanding and critical analysis.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

TypeThinkAI

Compare top AI models and generate text, images, and videos in one platform.

AI & DataTry the app

AI Humanizer

Transform AI text into human-like content that bypasses detection tools.

Business & ProductivityTry the app

Usercall

Conduct AI-moderated voice interviews to gather user feedback efficiently.

AI & DataTry the app

Continue Readings

Tech Xplore — AI & ML3 hours ago

LLMs choose friends and colleagues like people, researchers find

PositiveArtificial Intelligence

Researchers have found that large language models (LLMs) make decisions about networking and friendship in ways that closely resemble human behavior, both in synthetic simulations and real-world contexts. This suggests that LLMs can replicate social decision-making processes similar to those of people.

Read full article

via Tech Xplore — AI & ML

IEEE Spectrum — AI5 hours ago

AI’s Wrong Answers Are Bad. Its Wrong Reasoning Is Worse

NegativeArtificial Intelligence

Recent studies reveal that while AI, particularly generative AI, has improved in accuracy, its flawed reasoning processes pose significant risks in critical sectors such as healthcare, law, and education. These findings highlight the need for a deeper understanding of AI's decision-making mechanisms.

Read full article

via IEEE Spectrum — AI

arXiv — cs.LG13 hours ago

Agentic Policy Optimization via Instruction-Policy Co-Evolution

PositiveArtificial Intelligence

A novel framework named INSPO has been introduced to enhance reinforcement learning through dynamic instruction optimization, addressing the limitations of static instructions in Reinforcement Learning with Verifiable Rewards (RLVR). This approach allows for a more adaptive learning process, where instruction candidates evolve alongside the agent's policy, improving multi-turn reasoning capabilities in large language models (LLMs).

Read full article

via arXiv — cs.LG

arXiv — cs.LG13 hours ago

Capturing Context-Aware Route Choice Semantics for Trajectory Representation Learning

PositiveArtificial Intelligence

A new framework named CORE has been proposed for trajectory representation learning (TRL), which aims to enhance the encoding of raw trajectory data into low-dimensional embeddings by integrating context-aware route choice semantics. This approach addresses the limitations of existing TRL methods that treat trajectories as static sequences, thereby enriching the semantic representation of urban mobility patterns.

Read full article

via arXiv — cs.LG

arXiv — cs.LG13 hours ago

Influence Functions for Efficient Data Selection in Reasoning

NeutralArtificial Intelligence

A recent study has introduced influence functions as a method for efficient data selection in reasoning tasks, particularly for fine-tuning large language models (LLMs) on chain-of-thought (CoT) data. This approach aims to define data quality more effectively, moving beyond traditional heuristics like problem difficulty and trace length. Influence-based pruning has shown to outperform existing methods in math reasoning tasks.

Read full article

via arXiv — cs.LG

arXiv — stat.ML13 hours ago

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

NeutralArtificial Intelligence

A comprehensive review on missing data imputation highlights the challenges posed by incomplete datasets across various fields, including healthcare and e-commerce. The study synthesizes decades of research, categorizing imputation methods from classical techniques to modern machine learning approaches, emphasizing the need for a unified framework to address missingness mechanisms and imputation goals.

Read full article

via arXiv — stat.ML

arXiv — cs.LG13 hours ago

Escaping Collapse: The Strength of Weak Data for Large Language Model Training

PositiveArtificial Intelligence

Recent research has formalized the role of synthetically-generated data in training large language models (LLMs), highlighting that without proper curation, model performance can plateau or collapse. The study introduces a theoretical framework to determine the necessary curation levels to ensure continuous improvement in LLM performance, drawing inspiration from the boosting technique in machine learning.

Read full article

via arXiv — cs.LG

Nature — Machine Learning18 hours ago

Discovering the complete enhancer map of human herpesviruses using a natural language processing model

NeutralArtificial Intelligence

A recent study published in Nature — Machine Learning has unveiled a comprehensive enhancer map of human herpesviruses using a natural language processing model. This breakthrough aims to enhance the understanding of the regulatory elements that influence the behavior of these viruses, which are significant in human health.

Read full article

via Nature — Machine Learning