Unveiling Super Experts in Mixture-of-Experts Large Language Models

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
The discovery of Super Experts (SEs) in Mixture-of-Experts (MoE) large language models (LLMs) marks a significant advancement in understanding model performance. SEs, characterized by rare but extreme activation outliers, play a pivotal role in the forward inference of MoE LLMs. The study reveals that pruning just three out of 6,144 SEs in the Qwen3-30B-A3B model results in repetitive and uninformative outputs, underscoring their critical function. This model-specific and data-agnostic distribution of SEs remains unaffected by post-training processes, indicating their inherent significance. The analysis further demonstrates that compressing SEs disrupts the systematic outlier mechanism, leading to a collapse of attention sinks, which profoundly impacts the model's overall performance, especially in tasks requiring mathematical reasoning. This research not only sheds light on the intricate workings of LLMs but also emphasizes the need to preserve these Super Experts for optimal model fu…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about