World PulseNowPowered by AI

Trending:

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

VentureBeat — AI•Friday, December 12, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

The Allen Institute for AI (Ai2) has launched Olmo 3.1, an advanced iteration of its Olmo model family, which enhances reinforcement learning training to improve reasoning benchmarks. This update includes two optimized versions, Olmo 3.1 Think 32B for advanced research and Olmo 3.1 Instruct 32B for instruction-following tasks, alongside a programming-focused model, Olmo 3-Base.
This development signifies Ai2's commitment to pushing the boundaries of AI capabilities, particularly in efficiency, transparency, and control, which are crucial for enterprise applications. The extended reinforcement learning training schedule aims to bolster the model's performance in complex reasoning tasks.
The release of Olmo 3.1 aligns with a growing trend in AI towards models that prioritize customization and transparency, as seen in competing models like Qwen and Llama. This reflects a broader industry shift towards enhancing reasoning capabilities and coding skills in AI, which are essential for meeting the increasing demands of diverse applications in technology.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Oliv AI

Oliv AI helps sales teams automate outreach and personalize customer interactions efficiently.

AI & DataView app details

Oliv AI

Oliv AI helps sales teams automate outreach and personalize customer interactions efficiently.

AI & DataView app details

Odin AI

Unify all your AI tools and workflows in one powerful, integrated platform.

AI & DataView app details

Odin AI

Unify all your AI tools and workflows in one powerful, integrated platform.

AI & DataView app details

Continue Readings

Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs

arXiv — stat.ML2 days ago

Intelligently Weighting Multiple Reference Models for Direct Preference Optimization of LLMs

NeutralArtificial Intelligence

A recent study introduces new weighting strategies for Multiple-Reference Preference Optimization (MRPO) in fine-tuning large language models (LLMs). These strategies aim to improve the alignment of LLMs with human preferences by leveraging a mixture of reference models, addressing the limitations of current ad-hoc methods that lead to unreliable performance.

Read full article

via arXiv — stat.ML

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about