NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model

arXiv — cs.LG•Tuesday, December 2, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The recent introduction of NeKo, a Mixture-of-Experts (MoE) language model, aims to enhance post-recognition error correction across various modalities, including speech-to-text and vision-to-text. This model leverages a multi-task correction approach, allowing it to learn from diverse datasets while minimizing the increase in parameters typically associated with separate correction models.
This development is significant as it represents a breakthrough in achieving state-of-the-art performance in error correction, evidenced by a 5.0% reduction in word error rates and improved BLEU scores on the Open ASR Leaderboard. Such advancements could enhance the accuracy and reliability of automated transcription and translation systems.
The emergence of NeKo aligns with ongoing trends in artificial intelligence, particularly the growing adoption of Mixture-of-Experts architectures. These models are increasingly recognized for their ability to efficiently manage large-scale data and improve performance across multiple tasks, reflecting a broader shift towards more adaptable and specialized AI systems.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

ChatOne

Chat with multiple AI models like ChatGPT, Claude, and Gemini in one place.

AI & DataTry the app

Palteca

Master a new language with AI-driven lessons based on proven learning methods.

Lifestyle & HealthTry the app

Zemith-3bda3b

Your all-in-one AI platform for work and research assistance.

AI & DataTry the app

Continue Readings

arXiv — cs.CV18 hours ago

Understanding and Harnessing Sparsity in Unified Multimodal Models

PositiveArtificial Intelligence

A systematic analysis of unified multimodal models has been conducted, revealing significant insights into their components' compressibility and sensitivity. The study utilized training-free pruning methodologies to assess depth and width adjustments, particularly noting that understanding components are more compressible in generation tasks compared to generation components, which are sensitive to compression.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

SkyMoE: A Vision-Language Foundation Model for Enhancing Geospatial Interpretation with Mixture of Experts

PositiveArtificial Intelligence

SkyMoE has been introduced as a Mixture-of-Experts (MoE) vision-language model designed to improve geospatial interpretation, particularly in remote sensing tasks. This model addresses the limitations of existing general-purpose vision-language models by employing an adaptive router that generates task-specific routing instructions, allowing for enhanced differentiation between various tasks and interpretation granularities.

Read full article

via arXiv — cs.CV

arXiv — cs.CV18 hours ago

Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution

PositiveArtificial Intelligence

A new Mixture-of-Ranks (MoR) architecture has been proposed for one-step real-world image super-resolution (Real-ISR), integrating sparse Mixture-of-Experts (MoE) to enhance the adaptability of models in reconstructing high-resolution images from degraded samples. This approach utilizes a fine-grained expert partitioning strategy, treating each rank in Low-Rank Adaptation (LoRA) as an independent expert, thereby improving the model's ability to capture heterogeneous characteristics of real-world images.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

PositiveArtificial Intelligence

A novel formulation for reinforcement learning (RL) with large language models (LLMs) has been proposed, focusing on optimizing true sequence-level rewards through a surrogate token-level objective in policy gradient methods like REINFORCE. The study emphasizes minimizing training-inference discrepancies and policy staleness to enhance the validity of this approach.

Read full article

via arXiv — cs.LG