Think in Parallel, Answer as One: Logit Averaging for Open-Ended Reasoning

arXiv — cs.CLWednesday, December 3, 2025 at 5:00:00 AM
  • A new decoding strategy called ThinkMerge has been introduced, which utilizes logit averaging from K parallel reasoning traces to enhance open-ended reasoning tasks like code generation and web-based research. This method is designed to overcome the limitations of majority voting in these contexts, providing a coherent output without the need for extensive training.
  • The implementation of ThinkMerge is significant as it demonstrates improved performance in open-ended coding tasks, achieving notable gains in pass rates on benchmarks such as LiveCodeBench. This positions it as a competitive alternative to existing methods in AI-driven reasoning.
  • The development of ThinkMerge reflects a broader trend in AI research towards improving the efficiency and effectiveness of language models, particularly in addressing challenges like training-inference mismatch and enhancing long-horizon information-seeking capabilities. This aligns with ongoing efforts to refine AI tools for better interaction and reasoning in complex tasks.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
TableCache: Primary Foreign Key Guided KV Cache Precomputation for Low Latency Text-to-SQL
PositiveArtificial Intelligence
A new approach called TableCache has been proposed to enhance low latency in Text-to-SQL tasks by precomputing key-value (KV) caches offline while preserving primary foreign key relationships between tables. This method addresses inefficiencies in existing inference engines like SGLang and vLLM, which generate redundant cache copies when processing queries with varying table orders.
MemoBrain: Executive Memory as an Agentic Brain for Reasoning
NeutralArtificial Intelligence
The introduction of MemoBrain, an executive memory model for tool-augmented agents, addresses the challenges of long-horizon reasoning in AI frameworks. This model captures salient intermediate states and their logical relations, enhancing the coherence and goal-directedness of reasoning processes.
ExpSeek: Self-Triggered Experience Seeking for Web Agents
PositiveArtificial Intelligence
A new technical paradigm called ExpSeek has been introduced, enhancing web agents' interaction capabilities by enabling proactive experience seeking rather than passive experience injection. This approach utilizes step-level entropy thresholds to optimize intervention timing and tailor-designed experience content, demonstrating significant performance improvements in Qwen3-8B and Qwen3-32B models across various benchmarks.
ToolRM: Towards Agentic Tool-Use Reward Modeling
PositiveArtificial Intelligence
ToolRM has been introduced as a new family of lightweight reward models specifically designed for tool-use scenarios, addressing the limitations of existing reward models in aligning large language models (LLMs) with human preferences. This development includes a novel pipeline for generating high-quality preference data and a benchmark for evaluating these models on tool-calling tasks.
KVzap: Fast, Adaptive, and Faithful KV Cache Pruning
PositiveArtificial Intelligence
KVzap has been introduced as a fast and adaptive method for key-value (KV) cache pruning in transformer-based language models, addressing the critical inference bottleneck caused by growing context lengths. This method achieves 2-4 times KV cache compression with minimal accuracy loss, demonstrating state-of-the-art performance on the KVpress leaderboard.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about