DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning

arXiv — cs.CLTuesday, December 2, 2025 at 5:00:00 AM
arXiv:2508.12726v4 Announce Type: replace Abstract: Large language models (LLMs) have achieved remarkable success in many natural language tasks but still struggle with complex, multi-step reasoning, particularly across diverse disciplines. Existing reasoning datasets often lack disciplinary breadth, reasoning depth, and diversity, as well as guiding principles for question synthesis. We propose DESIGNER: a DESIGN-logic-guidEd Reasoning data synthesis pipeline that leverages naturally available, extensive raw documents (e.g., book corpus and web corpus) to generate multidisciplinary challenging questions. We introduce the concept of "design logic" and instruct LLMs to mimic human educators' question-creation process, enabling the automated synthesis of large-scale, high-difficulty questions. We use LLMs to reverse-engineer and abstract over 120,000 design logics from existing questions across various disciplines. By matching these design logics with source documents, we are able to generate reasoning questions with controllable question types and difficulty levels. Using this pipeline, we synthesized two large-scale reasoning datasets that span 75 disciplines: DLR-Book (3.04 million questions from the book corpus) and DLR-Web (1.66 million questions from the web corpus). Data analysis indicates that the questions synthesized by our method exhibit greater difficulty and diversity compared to those in the baseline datasets. We validate our synthesized data through supervised fine-tuning (SFT) on the Qwen3 and Llama3 model families. Our data substantially enhances their multidisciplinary reasoning capabilities, outperforming existing datasets. Notably, by applying SFT on the base versions of these models using only our data, we even surpass their official final models that have undergone the full post-training process.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
PositiveArtificial Intelligence
OpenREAD is a newly proposed framework that enhances end-to-end autonomous driving by integrating a vision-language model with reinforced open-ended reasoning, addressing limitations in traditional supervised fine-tuning and reinforcement fine-tuning methods. This innovation aims to improve decision-making and planning in complex driving scenarios.
FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges
NeutralArtificial Intelligence
FineGRAIN has introduced a structured methodology to evaluate failure modes in text-to-image (T2I) models using vision language models (VLMs) as judges. This approach aims to identify specific errors in image generation, such as inaccuracies in object count and color, by testing 27 failure modes across five T2I models, including Flux and various versions of SD3.
InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration
PositiveArtificial Intelligence
The introduction of InEx presents a novel approach to mitigating hallucinations in large language models (LLMs) by employing a training-free, multi-agent framework that incorporates introspective reasoning and cross-modal collaboration. This method aims to enhance the reliability of multimodal LLMs (MLLMs) by autonomously refining responses through iterative verification processes.
Think Before You Prune: Self-Reflective Structured Pruning for Reasoning Language Models
PositiveArtificial Intelligence
Recent research highlights the challenges of pruning reasoning language models (RLMs) like OpenAI's o1 and DeepSeek-R1, which are crucial for multi-step reasoning tasks. The study reveals that traditional pruning methods can severely impair the accuracy and coherence of these models, even at moderate levels of sparsity.
Deep Research: A Systematic Survey
PositiveArtificial Intelligence
A systematic survey on Deep Research (DR) has been published, highlighting the evolution of large language models (LLMs) from mere text generators to sophisticated problem solvers. This survey outlines a three-stage roadmap for integrating LLMs with external tools, enabling them to tackle complex tasks that require critical thinking and multi-source verification.
promptolution: A Unified, Modular Framework for Prompt Optimization
PositiveArtificial Intelligence
A new framework named promptolution has been introduced to optimize prompts for large language models (LLMs), addressing the challenges of existing isolated implementations. This unified, modular open-source system integrates various prompt optimizers, facilitating easier adoption for both researchers and practitioners.
Do Large Language Models Think Like the Brain? Sentence-Level Evidences from Layer-Wise Embeddings and fMRI
PositiveArtificial Intelligence
A recent study investigates the alignment between large language models (LLMs) and human brain processes, focusing on how layer-wise representations in LLMs correspond to neural responses during sentence comprehension. By analyzing data from 14 LLMs and fMRI scans of participants listening to a narrative, researchers identified significant correlations between model layers and brain activity.
Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs
NeutralArtificial Intelligence
Large language model-powered chatbots have significantly changed the way individuals access information, particularly in critical areas like mental health. However, their effectiveness in safely managing crises such as suicidal thoughts and self-harm remains uncertain due to the absence of standardized crisis classifications and clinical evaluation methods. This study introduces a taxonomy of crisis categories, a dataset of mental health inputs, and a clinical response assessment protocol to enhance crisis management by LLMs.