New training method boosts AI multimodal reasoning with smaller, smarter datasets

VentureBeat — AITuesday, December 2, 2025 at 12:30:00 PM
New training method boosts AI multimodal reasoning with smaller, smarter datasets
  • Researchers at MiroMind AI and several Chinese universities have introduced OpenMMReasoner, a new training framework designed to enhance the multimodal reasoning capabilities of language models. This framework employs a two-stage process, refining a base model through supervised fine-tuning followed by reinforcement learning to improve reasoning in tasks that integrate text and visual data.
  • The introduction of OpenMMReasoner is significant as it demonstrates that models trained with this framework can outperform leading visual reasoning models while utilizing smaller, higher-quality datasets. This advancement not only enhances AI capabilities but also provides an open-source foundation for developing robust applications requiring traceability.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
OpenAI report reveals a 6x productivity gap between AI power users and everyone else
NeutralArtificial Intelligence
A recent report from OpenAI highlights a significant productivity gap, revealing that AI power users send six times more messages to ChatGPT than the median employee in their companies. This disparity is even more pronounced in specific roles, such as coding and data analysis, where top users engage 17 times more than their peers.
Chinese AI startup Z.ai releases the GLM-4.6V open-weight vision models, with support for native function calling, available with 106B and 9B parameters (Carl Franzen/VentureBeat)
PositiveArtificial Intelligence
Chinese AI startup Z.ai has launched the GLM-4.6V series, featuring open-weight vision models with 106 billion and 9 billion parameters, designed to support native function calling. This release aims to enhance multimodal reasoning capabilities in AI applications, marking a significant step in the company's product offerings.
CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks
PositiveArtificial Intelligence
The introduction of CoT4Det, a Chain-of-Thought framework, aims to enhance the performance of Large Vision-Language Models (LVLMs) on perception-oriented tasks such as object detection and semantic segmentation, which have previously lagged behind task-specific models. This framework reformulates these tasks into three interpretable steps: classification, counting, and grounding.