New training method boosts AI multimodal reasoning with smaller, smarter datasets

VentureBeat — AI•Tuesday, December 2, 2025 at 12:30:00 PM

PositiveArtificial Intelligence

New training method boosts AI multimodal reasoning with smaller, smarter datasets

Researchers at MiroMind AI and several Chinese universities have introduced OpenMMReasoner, a new training framework designed to enhance the multimodal reasoning capabilities of language models. This framework employs a two-stage process, refining a base model through supervised fine-tuning followed by reinforcement learning to improve reasoning in tasks that integrate text and visual data.
The introduction of OpenMMReasoner is significant as it demonstrates that models trained with this framework can outperform leading visual reasoning models while utilizing smaller, higher-quality datasets. This advancement not only enhances AI capabilities but also provides an open-source foundation for developing robust applications requiring traceability.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Magicley AI

Access a suite of AI generators for all your creative and productivity tasks.

AI & DataView app details

Continue Readings

VentureBeat — AIa day ago

OpenAI report reveals a 6x productivity gap between AI power users and everyone else

NeutralArtificial Intelligence

A recent report from OpenAI highlights a significant productivity gap, revealing that AI power users send six times more messages to ChatGPT than the median employee in their companies. This disparity is even more pronounced in specific roles, such as coding and data analysis, where top users engage 17 times more than their peers.

Read full article

via VentureBeat — AI

Techmeme2 days ago

Chinese AI startup Z.ai releases the GLM-4.6V open-weight vision models, with support for native function calling, available with 106B and 9B parameters (Carl Franzen/VentureBeat)

PositiveArtificial Intelligence

Chinese AI startup Z.ai has launched the GLM-4.6V series, featuring open-weight vision models with 106 billion and 9 billion parameters, designed to support native function calling. This release aims to enhance multimodal reasoning capabilities in AI applications, marking a significant step in the company's product offerings.

Read full article

via Techmeme

arXiv — cs.CV3 days ago

CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks

PositiveArtificial Intelligence

The introduction of CoT4Det, a Chain-of-Thought framework, aims to enhance the performance of Large Vision-Language Models (LVLMs) on perception-oriented tasks such as object detection and semantic segmentation, which have previously lagged behind task-specific models. This framework reformulates these tasks into three interpretable steps: classification, counting, and grounding.

Read full article

via arXiv — cs.CV