MoDES: Accelerating Mixture-of-Experts Multimodal Large Language Models via Dynamic Expert Skipping

arXiv — cs.CV•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

MoDES has been proposed as a solution to the inefficiencies faced by Mixture
This development is significant as it addresses the computational overhead associated with MLLMs, potentially leading to faster and more efficient applications in various AI
The introduction of MoDES aligns with ongoing efforts in the AI community to optimize large language models, reflecting a broader trend towards improving model efficiency and adaptability in diverse applications.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LG7 hours ago

MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding

PositiveArtificial Intelligence

The paper introduces MOON, a generative Multimodal Large Language Model (MLLM) aimed at enhancing product representation learning in e-commerce. It addresses challenges such as the lack of multimodal modeling modules and background noise in product images. The proposed model seeks to improve the alignment between multiple images and texts associated with products, marking a significant advancement in e-commerce product understanding.

Read full article

via arXiv — cs.LG

arXiv — cs.CV7 hours ago

Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

PositiveArtificial Intelligence

Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in tasks such as OCR and VQA, but hallucination remains a significant challenge. This paper is the first to explore verb hallucination in MLLMs, revealing that many state-of-the-art models exhibit severe issues with verb concepts. The study evaluates existing methods aimed at reducing hallucinations related to object concepts and assesses their effectiveness on verb hallucinations.

Read full article

via arXiv — cs.CV

arXiv — cs.CV7 hours ago

ANTS: Adaptive Negative Textual Space Shaping for OOD Detection via Test-Time MLLM Understanding and Reasoning

PositiveArtificial Intelligence

The paper presents ANTS, an innovative method for enhancing Out-of-Distribution (OOD) detection by utilizing Adaptive Negative Textual Space. By leveraging multimodal large language models (MLLMs), the approach generates expressive negative sentences that accurately characterize OOD distributions. This method addresses the limitations of existing techniques, particularly in near-OOD detection, by caching images likely to be OOD samples and prompting MLLMs for detailed descriptions.

Read full article

via arXiv — cs.CV

arXiv — cs.CV7 hours ago

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models

PositiveArtificial Intelligence

A comprehensive study on visual token redundancy in discrete diffusion-based multimodal large language models (dMLLMs) has been conducted, revealing significant computational overhead during inference due to full-sequence attention. The research highlights that visual redundancy primarily occurs in from-scratch dMLLMs when addressing long-answer tasks and examines the impact of visual token pruning on model efficiency and responses.

Read full article

via arXiv — cs.CV

arXiv — cs.CV7 hours ago

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

PositiveArtificial Intelligence

OmniSparse introduces a training-aware fine-grained sparse attention framework aimed at enhancing the performance of long-video multimodal large language models (MLLMs). Unlike existing methods that focus on inference-time acceleration, OmniSparse operates effectively during both training and inference by dynamically allocating token budgets. This approach includes mechanisms for query selection and key-value selection, addressing the limitations of traditional sparse attention methods.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts

PositiveArtificial Intelligence

The paper introduces MoE-SpeQ, a novel inference system designed to address the memory limitations of Mixture-of-Experts (MoE) models during inference. Traditional methods often lead to I/O bottlenecks due to data-dependent expert selection. MoE-SpeQ mitigates this by utilizing a small on-device draft model to predict future expert requirements, allowing for proactive prefetching from host memory. This approach enhances performance by reducing the critical path of execution and improving overall efficiency in MoE applications.

Read full article

via arXiv — cs.LG

arXiv — cs.CVa day ago

AdaTok: Adaptive Token Compression with Object-Aware Representations for Efficient Multimodal LLMs

PositiveArtificial Intelligence

AdaTok introduces an innovative object-level token merging strategy for Adaptive Token compression, aimed at enhancing the efficiency of Multimodal Large Language Models (MLLMs). Traditional patch-level tokenization has resulted in excessive computational and memory demands, leading to misalignments with human cognitive processes. The proposed method significantly reduces token usage to 10% while maintaining nearly 96% of the original model's performance, addressing critical challenges in multimodal understanding and reasoning.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection

PositiveArtificial Intelligence

The paper introduces a new Mixture-of-Experts framework for object detection, which utilizes adaptive routing among multiple YOLOv9-T experts. This approach allows for dynamic feature specialization, resulting in improved performance metrics, specifically higher mean Average Precision (mAP) and Average Recall (AR) compared to using a single YOLOv9-T model. The findings suggest significant advancements in the field of object detection.

Read full article

via arXiv — cs.CV