When Words Change the Model: Sensitivity of LLMs for Constraint Programming Modelling

arXiv — cs.LGWednesday, November 19, 2025 at 5:00:00 AM
  • The study explores the effectiveness of large language models (LLMs) in constraint programming, revealing that while they can generate models from natural language descriptions, their success may be influenced by data contamination. By modifying known problems, researchers assessed LLMs' reasoning abilities, finding that they often produce plausible outputs but may lack genuine reasoning skills.
  • This development is significant as it challenges the perceived reliability of LLMs in generating accurate models for optimization tasks, prompting a reevaluation of their application in real
  • The findings contribute to ongoing discussions about the limitations of LLMs, particularly in their ability to align outputs with desired probability distributions and the need for more comprehensive evaluation frameworks that prioritize real
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
PositiveArtificial Intelligence
The integration of Large Language Models (LLMs) with 3D vision is revolutionizing robotic perception and autonomy. This approach enhances robotic sensing technologies, allowing machines to understand and interact with complex environments using natural language and spatial awareness. The review discusses the foundational principles of LLMs and 3D data, examines critical 3D sensing technologies, and highlights advancements in scene understanding, text-to-3D generation, and embodied agents, while addressing the challenges faced in this evolving field.
Accuracy is Not Enough: Poisoning Interpretability in Federated Learning via Color Skew
NegativeArtificial Intelligence
Recent research highlights a new class of attacks in federated learning that compromise model interpretability without impacting accuracy. The study reveals that adversarial clients can apply small color perturbations, shifting a model's saliency maps from meaningful regions while maintaining predictions. This method, termed the Chromatic Perturbation Module, systematically creates adversarial examples by altering color contrasts, leading to persistent poisoning of the model's internal feature attributions, challenging assumptions about model reliability.
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
PositiveArtificial Intelligence
MMaDA-Parallel is a new multimodal diffusion framework aimed at enhancing thinking-aware generation in AI models. It addresses performance degradation caused by error propagation in existing autoregressive approaches. The framework introduces ParaBench, a benchmark for evaluating text and image outputs, revealing that misalignment between reasoning and generated images contributes to performance issues. MMaDA-Parallel employs supervised finetuning and Parallel Reinforcement Learning to improve interaction between text and images throughout the denoising process.
Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games
PositiveArtificial Intelligence
The paper discusses Zero-shot coordination (ZSC), a significant challenge in multi-agent game theory, particularly in evolving games. It emphasizes the need for agents to coordinate with previously unseen partners without fine-tuning. The study introduces Scalable Population Training (ScaPT), an efficient reinforcement learning framework that enhances zero-shot coordination by utilizing a meta-agent to manage a diverse pool of agents, addressing limitations of existing methods that focus on small populations and computational constraints.
How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders
PositiveArtificial Intelligence
The article discusses the limitations of current generative models, which, despite their ability to produce realistic outputs, often exhibit physical plausibility failures that go undetected by existing evaluation methods. To address this issue, the authors introduce Matryoshka Transcoders, a framework designed for the automatic identification and interpretation of these physical plausibility failure modes. This approach enhances the understanding of generative models and aims to facilitate targeted improvements.
SERL: Self-Examining Reinforcement Learning on Open-Domain
PositiveArtificial Intelligence
Self-Examining Reinforcement Learning (SERL) is a proposed framework that addresses challenges in applying Reinforcement Learning (RL) to open-domain tasks. Traditional methods face issues with subjectivity and reliance on external rewards. SERL innovatively positions large language models (LLMs) as both Actor and Judge, utilizing internal reward mechanisms. It employs Copeland-style pairwise comparisons to enhance the Actor's capabilities and introduces a self-consistency reward to improve the Judge's reliability, aiming to advance RL applications in open domains.
Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective
NeutralArtificial Intelligence
As embodied agents navigate complex environments, the ability to perceive and track individual objects over time is crucial, particularly for tasks involving similar objects. In non-Markovian contexts, decision-making relies on object-specific histories rather than the immediate scene. Without a persistent memory of past interactions, robotic policies may falter or repeat actions unnecessarily. To address this, LIBERO-Mem is introduced as a task suite designed to test robotic manipulation under conditions of partial observability at the object level.
Exploring Variance Reduction in Importance Sampling for Efficient DNN Training
PositiveArtificial Intelligence
Importance sampling is a technique utilized to enhance the efficiency of deep neural network (DNN) training by minimizing the variance of gradient estimators. This paper introduces a method for estimating variance reduction during DNN training using only minibatches sampled through importance sampling. Additionally, it suggests an optimal minibatch size for automatic learning rate adjustment and presents a metric to quantify the efficiency of importance sampling, supported by theoretical analysis and experiments demonstrating improved training efficiency and model accuracy.