MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding

arXiv — cs.LG•Wednesday, November 19, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

MOON has been introduced as a generative MLLM
This development is significant for e
The advancement of MOON reflects a broader trend in AI towards generative models, which are increasingly being recognized for their potential to overcome traditional modeling challenges, particularly in multimodal contexts.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.LG19 hours ago

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

PositiveArtificial Intelligence

MoETTA is a novel test-time adaptation (TTA) framework designed to address performance drops during mixed distribution shifts in machine learning. Traditional TTA methods struggle with diverse domain factors that can conflict, leading to suboptimal results. MoETTA leverages an entropy-based approach and the Mixture-of-Experts (MoE) architecture to allow for varied gradient directions across domains, enhancing adaptability during inference. This framework aims to improve performance in real-world applications where data distribution is often heterogeneous.

Read full article

via arXiv — cs.LG

arXiv — cs.LG19 hours ago

MoE-SpeQ: Speculative Quantized Decoding with Proactive Expert Prefetching and Offloading for Mixture-of-Experts

PositiveArtificial Intelligence

The paper introduces MoE-SpeQ, a novel inference system designed to address the memory limitations of Mixture-of-Experts (MoE) models during inference. Traditional methods often lead to I/O bottlenecks due to data-dependent expert selection. MoE-SpeQ mitigates this by utilizing a small on-device draft model to predict future expert requirements, allowing for proactive prefetching from host memory. This approach enhances performance by reducing the critical path of execution and improving overall efficiency in MoE applications.

Read full article

via arXiv — cs.LG

arXiv — cs.CV19 hours ago

MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising

PositiveArtificial Intelligence

MOON is a comprehensive set of sustainable iterative practices for multimodal representation learning, specifically designed for e-commerce applications. Fully deployed across Taobao's search advertising system, MOON has significantly improved click-through rate (CTR) predictions by 20% through its three-stage training paradigm of Pretraining, Post-training, and Application. Over three years, this project has undergone five iterations, providing valuable insights for the research community.

Read full article

via arXiv — cs.CV

arXiv — cs.CV19 hours ago

YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection

PositiveArtificial Intelligence

The paper introduces a new Mixture-of-Experts framework for object detection, which utilizes adaptive routing among multiple YOLOv9-T experts. This approach allows for dynamic feature specialization, resulting in improved performance metrics, specifically higher mean Average Precision (mAP) and Average Recall (AR) compared to using a single YOLOv9-T model. The findings suggest significant advancements in the field of object detection.

Read full article

via arXiv — cs.CV

arXiv — cs.CV19 hours ago

FAPE-IR: Frequency-Aware Planning and Execution Framework for All-in-One Image Restoration

PositiveArtificial Intelligence

FAPE-IR introduces a Frequency-Aware Planning and Execution framework for All-in-One Image Restoration (AIO-IR), designed to address multiple image degradations in complex conditions. Unlike existing methods that depend on task-specific designs, FAPE-IR utilizes a frozen Multimodal Large Language Model (MLLM) to analyze degraded images and create frequency-aware restoration plans. These plans guide a LoRA-based Mixture-of-Experts (LoRA-MoE) module, which dynamically selects experts based on the frequency features of the input image, enhancing restoration quality through adversarial training an…

Read full article

via arXiv — cs.CV

arXiv — cs.CL3 days ago

FedALT: Federated Fine-Tuning through Adaptive Local Training with Rest-of-World LoRA

PositiveArtificial Intelligence

The article presents FedALT, a new algorithm for federated fine-tuning of large language models (LLMs) that addresses the challenges of cross-client interference and data heterogeneity. Traditional methods, primarily based on FedAvg, often lead to suboptimal personalization due to model aggregation issues. FedALT allows each client to continue training its individual LoRA while integrating knowledge from a separate Rest-of-World (RoW) LoRA component. This approach includes an adaptive mixer to balance local adaptation with global information effectively.

Read full article

via arXiv — cs.CL

arXiv — cs.CL3 days ago

Grounded Visual Factualization: Factual Anchor-Based Finetuning for Enhancing MLLM Factual Consistency

PositiveArtificial Intelligence

The paper titled 'Grounded Visual Factualization: Factual Anchor-Based Finetuning for Enhancing MLLM Factual Consistency' addresses the issue of visual hallucination in Multimodal Large Language Models (MLLMs), where these models generate details that are inconsistent with the accompanying images. Current fine-tuning methods have shown limited success in improving factual reasoning. The authors propose a new approach called Grounded Visual Factualization (GVF) Finetuning, which enhances visual factual consistency through three mechanisms: Factual Anchor Data Augmentation, Fact-Aware Instructio…

Read full article

via arXiv — cs.CL