DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning

arXiv — cs.CL•Tuesday, December 2, 2025 at 5:00:00 AM

arXiv:2508.12726v4 Announce Type: replace Abstract: Large language models (LLMs) have achieved remarkable success in many natural language tasks but still struggle with complex, multi-step reasoning, particularly across diverse disciplines. Existing reasoning datasets often lack disciplinary breadth, reasoning depth, and diversity, as well as guiding principles for question synthesis. We propose DESIGNER: a DESIGN-logic-guidEd Reasoning data synthesis pipeline that leverages naturally available, extensive raw documents (e.g., book corpus and web corpus) to generate multidisciplinary challenging questions. We introduce the concept of "design logic" and instruct LLMs to mimic human educators' question-creation process, enabling the automated synthesis of large-scale, high-difficulty questions. We use LLMs to reverse-engineer and abstract over 120,000 design logics from existing questions across various disciplines. By matching these design logics with source documents, we are able to generate reasoning questions with controllable question types and difficulty levels. Using this pipeline, we synthesized two large-scale reasoning datasets that span 75 disciplines: DLR-Book (3.04 million questions from the book corpus) and DLR-Web (1.66 million questions from the web corpus). Data analysis indicates that the questions synthesized by our method exhibit greater difficulty and diversity compared to those in the baseline datasets. We validate our synthesized data through supervised fine-tuning (SFT) on the Qwen3 and Llama3 model families. Our data substantially enhances their multidisciplinary reasoning capabilities, outperforming existing datasets. Notably, by applying SFT on the base versions of these models using only our data, we even surpass their official final models that have undergone the full post-training process.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CVa day ago

OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic

PositiveArtificial Intelligence

OpenREAD is a newly proposed framework that enhances end-to-end autonomous driving by integrating a vision-language model with reinforced open-ended reasoning, addressing limitations in traditional supervised fine-tuning and reinforcement fine-tuning methods. This innovation aims to improve decision-making and planning in complex driving scenarios.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges

NeutralArtificial Intelligence

FineGRAIN has introduced a structured methodology to evaluate failure modes in text-to-image (T2I) models using vision language models (VLMs) as judges. This approach aims to identify specific errors in image generation, such as inaccuracies in object count and color, by testing 27 failure modes across five T2I models, including Flux and various versions of SD3.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration

PositiveArtificial Intelligence

The introduction of InEx presents a novel approach to mitigating hallucinations in large language models (LLMs) by employing a training-free, multi-agent framework that incorporates introspective reasoning and cross-modal collaboration. This method aims to enhance the reliability of multimodal LLMs (MLLMs) by autonomously refining responses through iterative verification processes.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models

PositiveArtificial Intelligence

Recent research highlights the challenges of pruning reasoning language models (RLMs) like OpenAI's o1 and DeepSeek-R1, which are crucial for multi-step reasoning tasks. The study reveals that traditional pruning methods can severely impair the accuracy and coherence of these models, even at moderate levels of sparsity.

Read full article

via arXiv — cs.LG

arXiv — cs.CLa day ago

Deep Research: A Systematic Survey

PositiveArtificial Intelligence

A systematic survey on Deep Research (DR) has been published, highlighting the evolution of large language models (LLMs) from mere text generators to sophisticated problem solvers. This survey outlines a three-stage roadmap for integrating LLMs with external tools, enabling them to tackle complex tasks that require critical thinking and multi-source verification.

Read full article

via arXiv — cs.CL

arXiv — cs.CLa day ago

promptolution: A Unified, Modular Framework for Prompt Optimization

PositiveArtificial Intelligence

A new framework named promptolution has been introduced to optimize prompts for large language models (LLMs), addressing the challenges of existing isolated implementations. This unified, modular open-source system integrates various prompt optimizers, facilitating easier adoption for both researchers and practitioners.

Read full article

via arXiv — cs.CL

arXiv — cs.CLa day ago

Do Large Language Models Think Like the Brain? Sentence-Level Evidences from Layer-Wise Embeddings and fMRI

PositiveArtificial Intelligence

A recent study investigates the alignment between large language models (LLMs) and human brain processes, focusing on how layer-wise representations in LLMs correspond to neural responses during sentence comprehension. By analyzing data from 14 LLMs and fMRI scans of participants listening to a narrative, researchers identified significant correlations between model layers and brain activity.

Read full article

via arXiv — cs.CL

arXiv — cs.CLa day ago

Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

NeutralArtificial Intelligence

Large language model-powered chatbots have significantly changed the way individuals access information, particularly in critical areas like mental health. However, their effectiveness in safely managing crises such as suicidal thoughts and self-harm remains uncertain due to the absence of standardized crisis classifications and clinical evaluation methods. This study introduces a taxonomy of crisis categories, a dataset of mental health inputs, and a clinical response assessment protocol to enhance crisis management by LLMs.

Read full article

via arXiv — cs.CL