Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

arXiv — cs.CLFriday, November 21, 2025 at 5:00:00 AM

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
LLMs choose friends and colleagues like people, researchers find
PositiveArtificial Intelligence
Researchers have found that large language models (LLMs) make decisions about networking and friendship in ways that closely resemble human behavior, both in synthetic simulations and real-world contexts. This suggests that LLMs can replicate social decision-making processes similar to those of people.
MoH: Multi-Head Attention as Mixture-of-Head Attention
PositiveArtificial Intelligence
A new architecture called Mixture-of-Head attention (MoH) has been proposed to enhance the efficiency of the multi-head attention mechanism, a key component of the Transformer model. This innovation allows tokens to selectively utilize attention heads, improving inference efficiency while maintaining or exceeding previous accuracy levels. MoH replaces the standard summation in multi-head attention with a weighted summation, introducing flexibility and unlocking additional performance potential.
InnoGym: Benchmarking the Innovation Potential of AI Agents
PositiveArtificial Intelligence
InnoGym has been introduced as the first benchmark and framework aimed at systematically evaluating the innovation potential of AI agents. This initiative focuses on two key metrics: performance gain and novelty, assessing not just the correctness of solutions but also the originality of approaches across 18 tasks from real-world engineering and scientific domains.
SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
NeutralArtificial Intelligence
A recent study introduced Semantically Equivalent and Coherent Attacks (SECA) aimed at eliciting hallucinations in Large Language Models (LLMs) through realistic prompt modifications that maintain semantic coherence. This approach addresses the limitations of previous adversarial attacks that often resulted in unrealistic prompts, thereby enhancing the understanding of how LLMs can produce hallucinations in practical applications.
Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling
PositiveArtificial Intelligence
A new quantization method called Four Over Six (4/6) has been introduced to enhance the NVFP4 quantization algorithm, which is crucial for large language models (LLMs). This method evaluates two potential scale factors for each block of values, addressing issues of performance degradation during inference and divergence during training that arise from quantization errors in floating-point formats.
Optimizing Stroke Risk Prediction: A Machine Learning Pipeline Combining ROS-Balanced Ensembles and XAI
PositiveArtificial Intelligence
A new machine learning framework has been developed to optimize stroke risk prediction, utilizing ensemble modeling and explainable AI techniques. This framework achieved a remarkable accuracy of 99.09% on the Stroke Prediction Dataset by employing Random Over-Sampling to address class imbalance and identifying key clinical variables through LIME-based methods.
Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
PositiveArtificial Intelligence
Recent advancements in Diffusion Mixture-of-Experts (MoE) models have highlighted the importance of architectural configurations over routing mechanisms. A systematic study has identified key factors such as expert modules and attention encodings that significantly enhance the performance of these models, suggesting that tuning these configurations can yield better results than routing innovations alone.
Teaching Language Models to Critique via Reinforcement Learning
PositiveArtificial Intelligence
A new framework called CTRL has been developed to teach large language models (LLMs) to critique and refine their outputs through reinforcement learning. This approach allows critic models to generate feedback that enhances the performance of generator models without human intervention, leading to improved pass rates and reduced errors in code generation tasks.