<p>The Allen Institute for AI (Ai2) recently released what it calls its most powerful <a href="https://venturebeat.com/ai/ai2s-olmo-3-family-challenges-qwen-and-llama-with-efficient-open-reasoning"><u>family of models yet, Olmo 3</u></a>. But the company kept iterating on the models, expanding its reinforcement learning (RL) runs, to create Olmo 3.1.</p><p>The new Olmo 3.1 models focus on efficiency, transparency, and control for enterprises. </p><p>Ai2 updated two of the three versions of Olmo 2: Olmo 3.1 Think 32B, the flagship model optimized for advanced research, and Olmo 3.1 Instruct 32B, designed for instruction-following, multi-turn dialogue, and tool use. </p><p>Olmo 3 has a third version, Olmo 3-Base for programming, comprehension, and math. It also works well for continue fine-tuning. </p><p>Ai2 said that to upgrade Olmo 3 Think 32B to Olmo 3.1, its researchers extended its best RL run with a longer training schedule. </p><p>“After the original Olmo 3 launch, we resumed our RL training run for Olmo 3 32B Think, training for an additional 21 days on 224 GPUs with extra epochs over our Dolci-Think-RL dataset,” Ai2 said in a <a href="https://allenai.org/blog/olmo3"><u>blog post</u></a>. “This yielded Olmo 3.1 32B Think, which brings substantial gains across math, reasoning, and instruction-following benchmarks: improvements of 5+ points on AIME, 4+ points on ZebraLogic, 4+ points on IFEval, and 20+ points on IFBench, alongside stronger performance on coding and complex multi-step tasks.”</p><div></div><p>To get to Olmo 3.1 Instruct, Ai2 said its researchers applied the recipe behind the smaller Instruct size, 7B, to the larger model.</p><p>Olmo 3.1 Instruct 32B is &quot;optimized for chat, tool use, &amp; multi-turn dialogue—making it a much more performant sibling of Olmo 3 Instruct 7B and ready for real-world applications,” Ai2 said in a <a href="https://x.com/allen_ai/status/1999528338365247539"><u>post on X</u></a>. </p><p>For now, the new checkpoints are available on the Ai2 Playground or Hugging Face, with API access coming soon. </p><h2>Better performance on benchmarks</h2><p>The Olmo 3.1 models performed well on benchmark tests, predictably beating the Olmo 3 models. </p><p>Olmo 3.1 Think outperformed Qwen 3 32B models in the AIME 2025 benchmark and performed close to Gemma 27B. </p><p>Olmo 3.1 Instruct performed strongly against its open-source peers, even beating models like Gemma 3 on the Math benchmark.</p><p>“As for Olmo 3.1 32B Instruct, it’s a larger-scale instruction-tuned model built for chat, tool use, and multi-turn dialogue. Olmo 3.1 32B Instruct is our most capable fully open chat model to date and — in our evaluations — the strongest fully open 32B-scale instruct model,” the company said. </p><div></div><p>Ai2 also upgraded its RL-Zero 7B models for math and coding. The company said on X that both models benefited from longer and more stable training runs.</p><h2>Commitment to transparency and open source </h2><p>Ai2 previously told VentureBeat that it designed the Olmo 3 family of models to offer enterprises and research labs more control and understanding of the data and training that went into the model. </p><p>Organizations could add to the model’s data mix and retrain it to also learn from what’s been added.  </p><p>This has long been a commitment for Ai2, which also offers a <a href="https://venturebeat.com/ai/whats-inside-the-llm-ai2-olmotrace-will-trace-the-source"><u>tool called OlmoTrace</u></a> that tracks how LLM outputs match its training data.  </p><div></div><p>“Together, Olmo 3.1 Think 32B and Olmo 3.1 Instruct 32B show that openness and performance can advance together. By extending the same model flow, we continue to improve capabilities while retaining end-to-end transparency over data, code, and training decisions,” Ai2 said. </p><p>





</p>

أطلق معهد ألين للذكاء الاصطناعي (Ai2) نموذج أولمو 3.1، وهو إصدار متقدم من عائلة نماذج أولمو، والذي يعزز تدريب التعلم المعزز لتحسين معايير التفكير. تتضمن هذه التحديثات إصدارين محسّنين، أولمو 3.1 ثينك 32B للبحث المتقدم وأولمو 3.1 إنستركت 32B لمهام اتباع التعليمات، بالإضافة إلى نموذج مخصص للبرمجة، أولمو 3-بيس.

El Instituto Allen para la IA (Ai2) ha lanzado Olmo 3.1, una iteración avanzada de su familia de modelos Olmo, que mejora el entrenamiento de aprendizaje por refuerzo para fortalecer los estándares de razonamiento. Esta actualización incluye dos versiones optimizadas, Olmo 3.1 Think 32B para investigación avanzada y Olmo 3.1 Instruct 32B para tareas de seguimiento de instrucciones, junto con un modelo enfocado en programación, Olmo 3-Base.

L'Institut Allen pour l'IA (Ai2) a lancé Olmo 3.1, une itération avancée de sa famille de modèles Olmo, qui améliore l'entraînement par renforcement pour améliorer les références en matière de raisonnement. Cette mise à jour comprend deux versions optimisées, Olmo 3.1 Think 32B pour la recherche avancée et Olmo 3.1 Instruct 32B pour les tâches de suivi d'instructions, ainsi qu'un modèle axé sur la programmation, Olmo 3-Base.

The Allen Institute for AI (Ai2) has launched Olmo 3.1, an advanced iteration of its Olmo model family, which enhances reinforcement learning training to improve reasoning benchmarks. This update includes two optimized versions, Olmo 3.1 Think 32B for advanced research and Olmo 3.1 Instruct 32B for instruction-following tasks, alongside a programming-focused model, Olmo 3-Base.

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

One More Thing in AI – Your Shortcut to AI Mastery

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

Was this article worth reading? Share it

One More Thing in AI

One More Thing in AI

LucidQuery AI

LucidQuery AI

Oliv AI

Oliv AI

Ready to build your own newsroom?

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

Was this article worth reading? Share it

One More Thing in AI

One More Thing in AI

LucidQuery AI

LucidQuery AI

Oliv AI

Oliv AI

Ready to build your own newsroom?

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks