Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces

arXiv — cs.CLTuesday, November 25, 2025 at 5:00:00 AM
  • Recent advancements in large language models (LLMs) have introduced test-time scaling techniques that enhance reasoning capabilities, as demonstrated by models like DeepSeek-R1 and OpenAI's gpt-oss. These models generate intermediate reasoning traces to improve accuracy in solving complex problems, allowing for effective post-training of smaller models without extensive human input.
  • The ability to generate high-quality reasoning traces is significant for companies like OpenAI and DeepSeek, as it enables them to refine their models more efficiently and cost-effectively. This development enhances the competitive edge of these organizations in the rapidly evolving AI landscape.
  • The ongoing evolution of reasoning in LLMs highlights broader challenges in AI, such as the need for reliable fact-checking and the management of hallucinations in generated content. As these models become more integrated into various applications, addressing these issues will be crucial for their reliability and acceptance in critical fields like politics and healthcare.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Want to ditch ChatGPT? Gemini 3 shows early signs of winning the AI race
PositiveArtificial Intelligence
Google has launched its new AI model, Gemini 3, which has shown early signs of outperforming competitors like ChatGPT in benchmark tests, marking a significant advancement in AI technology. This rollout is expected to enhance user interactions by better understanding requests and providing more relevant responses.
OpenAI Locks Down Office After Violent Threat
NegativeArtificial Intelligence
OpenAI has temporarily locked down its San Francisco offices following a violent threat made by an activist, who allegedly expressed intentions to harm employees. This decision was communicated internally through OpenAI's Slack platform, highlighting the seriousness of the threat.
OpenAI Ordered to Drop 'Cameo' From Sora App Following Trademark Dispute
NegativeArtificial Intelligence
OpenAI has been ordered to cease using the term 'Cameo' in its Sora app following a temporary restraining order issued by a Northern California judge due to a trademark dispute with the video app Cameo. This ruling could significantly impact the functionality of Sora, which is designed for creating AI-generated celebrity videos.
What to know about Claude Opus 4.5
PositiveArtificial Intelligence
Anthropic has launched Claude Opus 4.5, an advanced AI model that emphasizes coding efficiency, cost-effectiveness, and user-controlled reasoning, marking a significant step in AI development. This model is positioned as a direct competitor to offerings from OpenAI and Google, showcasing enhanced capabilities in various tasks.
Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration
PositiveArtificial Intelligence
A new framework called BeMyEyes has been proposed to enhance the capabilities of Large Language Models (LLMs) by integrating them with Vision-Language Models (VLMs) through a multi-agent collaboration approach. This modular system aims to improve multimodal reasoning by allowing efficient VLMs to act as perceivers while powerful LLMs serve as reasoners, facilitating better interaction and understanding of complex data.
Context-Aware Whisper for Arabic ASR Under Linguistic Varieties
PositiveArtificial Intelligence
A new approach to Arabic Automatic Speech Recognition (ASR) has been introduced, leveraging context-aware prompting strategies to adapt OpenAI's Whisper model. This method addresses the challenges posed by Arabic's dialectal variations and limited labeled data, achieving significant reductions in word error rates for both Modern Standard Arabic and dialectal speech.
Large Language Models Require Curated Context for Reliable Political Fact-Checking -- Even with Reasoning and Web Search
PositiveArtificial Intelligence
Recent evaluations of large language models (LLMs) from major tech companies, including OpenAI and Google, reveal that while these models have advanced reasoning capabilities and web search tools, they still struggle with reliable political fact-checking. A study assessed 15 LLMs against over 6,000 claims fact-checked by PolitiFact, finding that curated context significantly enhances their performance.
Beyond the Rubric: Cultural Misalignment in LLM Benchmarks for Sexual and Reproductive Health
NeutralArtificial Intelligence
A recent benchmarking exercise evaluated a chatbot designed for sexual and reproductive health (SRH) in an underserved community in India, revealing significant cultural misalignment in the assessment of Large Language Models (LLMs). The evaluation utilized HealthBench, a benchmark by OpenAI, which rated responses low despite many being culturally appropriate and medically accurate according to qualitative analysis by experts.