SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations

arXiv — cs.LG•Tuesday, December 2, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

A recent study introduced Semantically Equivalent and Coherent Attacks (SECA) aimed at eliciting hallucinations in Large Language Models (LLMs) through realistic prompt modifications that maintain semantic coherence. This approach addresses the limitations of previous adversarial attacks that often resulted in unrealistic prompts, thereby enhancing the understanding of how LLMs can produce hallucinations in practical applications.
The development of SECA is significant as it provides a more nuanced method for exploring the reliability of LLMs, which are increasingly used in high-stakes environments. By focusing on realistic modifications, this research could lead to improved safety and trustworthiness in LLM applications, essential for their deployment in critical areas such as healthcare and legal systems.
This research aligns with ongoing efforts to optimize LLMs through various methodologies, including constrained learning and active slice discovery. The focus on realistic adversarial prompts reflects a broader trend in AI research to enhance model robustness and reliability, addressing concerns about biases and inaccuracies that can arise in LLM evaluations and applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

ConsoleX

Connect to all major LLMs in one unified development playground.

Business & ProductivityTry the app

LangWatch

Monitor and improve your AI applications for quality, safety, and reliability.

AI & DataTry the app

Grubby.AI

Humanize AI text instantly to pass Turnitin and other detectors with ease.

Lifestyle & HealthTry the app

Continue Readings

Tech Xplore — AI & ML6 hours ago

LLMs choose friends and colleagues like people, researchers find

PositiveArtificial Intelligence

Researchers have found that large language models (LLMs) make decisions about networking and friendship in ways that closely resemble human behavior, both in synthetic simulations and real-world contexts. This suggests that LLMs can replicate social decision-making processes similar to those of people.

Read full article

via Tech Xplore — AI & ML

arXiv — cs.LG16 hours ago

Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification

NeutralArtificial Intelligence

A recent study evaluated the performance of Large Language Models (LLMs) in financial tabular classification tasks, revealing discrepancies between LLMs' self-explanations of feature impact and their SHAP values. The research indicates that while LLMs offer a flexible alternative to traditional models like LightGBM, their reliability in high-stakes financial applications remains uncertain.

Read full article

via arXiv — cs.LG

arXiv — cs.LG16 hours ago

MoH: Multi-Head Attention as Mixture-of-Head Attention

PositiveArtificial Intelligence

A new architecture called Mixture-of-Head attention (MoH) has been proposed to enhance the efficiency of the multi-head attention mechanism, a key component of the Transformer model. This innovation allows tokens to selectively utilize attention heads, improving inference efficiency while maintaining or exceeding previous accuracy levels. MoH replaces the standard summation in multi-head attention with a weighted summation, introducing flexibility and unlocking additional performance potential.

Read full article

via arXiv — cs.LG

arXiv — cs.CV16 hours ago

InnoGym: Benchmarking the Innovation Potential of AI Agents

PositiveArtificial Intelligence

InnoGym has been introduced as the first benchmark and framework aimed at systematically evaluating the innovation potential of AI agents. This initiative focuses on two key metrics: performance gain and novelty, assessing not just the correctness of solutions but also the originality of approaches across 18 tasks from real-world engineering and scientific domains.

Read full article

via arXiv — cs.CV

arXiv — cs.LG16 hours ago

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

PositiveArtificial Intelligence

A new quantization method called Four Over Six (4/6) has been introduced to enhance the NVFP4 quantization algorithm, which is crucial for large language models (LLMs). This method evaluates two potential scale factors for each block of values, addressing issues of performance degradation during inference and divergence during training that arise from quantization errors in floating-point formats.

Read full article

via arXiv — cs.LG

arXiv — cs.LG16 hours ago

Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe

PositiveArtificial Intelligence

Recent advancements in Diffusion Mixture-of-Experts (MoE) models have highlighted the importance of architectural configurations over routing mechanisms. A systematic study has identified key factors such as expert modules and attention encodings that significantly enhance the performance of these models, suggesting that tuning these configurations can yield better results than routing innovations alone.

Read full article

via arXiv — cs.LG

arXiv — cs.LG16 hours ago

AlignSAE: Concept-Aligned Sparse Autoencoders

PositiveArtificial Intelligence

The introduction of AlignSAE, a method designed to align Sparse Autoencoder features with a defined ontology, marks a significant advancement in the interpretability of Large Language Models (LLMs). This approach employs a two-phase training process, combining unsupervised pre-training with supervised post-training to enhance the alignment of features with human-defined concepts.

Read full article

via arXiv — cs.LG

arXiv — cs.LG16 hours ago

Teaching Language Models to Critique via Reinforcement Learning

PositiveArtificial Intelligence

A new framework called CTRL has been developed to teach large language models (LLMs) to critique and refine their outputs through reinforcement learning. This approach allows critic models to generate feedback that enhances the performance of generator models without human intervention, leading to improved pass rates and reduced errors in code generation tasks.

Read full article

via arXiv — cs.LG