Unveiling Super Experts in Mixture-of-Experts Large Language Models
NeutralArtificial Intelligence
The discovery of Super Experts (SEs) in Mixture-of-Experts (MoE) large language models (LLMs) marks a significant advancement in understanding model performance. SEs, characterized by rare but extreme activation outliers, play a pivotal role in the forward inference of MoE LLMs. The study reveals that pruning just three out of 6,144 SEs in the Qwen3-30B-A3B model results in repetitive and uninformative outputs, underscoring their critical function. This model-specific and data-agnostic distribution of SEs remains unaffected by post-training processes, indicating their inherent significance. The analysis further demonstrates that compressing SEs disrupts the systematic outlier mechanism, leading to a collapse of attention sinks, which profoundly impacts the model's overall performance, especially in tasks requiring mathematical reasoning. This research not only sheds light on the intricate workings of LLMs but also emphasizes the need to preserve these Super Experts for optimal model fu…
— via World Pulse Now AI Editorial System
