One Router to Route Them All: Homogeneous Expert Routing for Heterogeneous Graph Transformers

arXiv — cs.LGWednesday, November 12, 2025 at 5:00:00 AM
The introduction of Homogeneous Expert Routing (HER) represents a significant advancement in the field of heterogeneous graph neural networks (HGNNs). Traditional approaches often depend on type-specific experts, which can hinder knowledge transfer across different node types. HER addresses this limitation by stochastically masking type embeddings, encouraging a more flexible, type-agnostic specialization among experts. Evaluated on benchmark datasets such as IMDB, ACM, and DBLP, HER demonstrated superior performance compared to standard Heterogeneous Graph Transformers (HGT) and type-separated Mixture-of-Experts (MoE) baselines. Notably, analysis of the IMDB dataset revealed that HER experts specialized based on semantic patterns, such as movie genres, rather than rigid node types. This shift not only enhances the model's efficiency and interpretability but also establishes a new design principle for heterogeneous graph learning, emphasizing the importance of regularizing type depende…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review
NeutralArtificial Intelligence
The article titled 'Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review' discusses the challenges in identifying Intermodal Loading Units (ILUs) such as containers and semi-trailers, which are crucial for global trade. The review maps various Computer Vision (CV) methods for ILU identification, clarifies terminology, and summarizes the evolution of approaches. It highlights research gaps and future directions, emphasizing the need for efficient identification methods to improve terminal operations. A total of 63 empirical studies from 1990 to 2025 were revi…
Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models
PositiveArtificial Intelligence
The paper titled 'Pre-Attention Expert Prediction and Prefetching for Mixture-of-Experts Large Language Models' introduces a method to enhance the efficiency of Mixture-of-Experts (MoE) Large Language Models (LLMs). The authors propose a pre-attention expert prediction technique that improves accuracy and reduces computational overhead by utilizing activations before the attention block. This approach aims to optimize expert prefetching, achieving about a 15% improvement in accuracy over existing methods.
ERMoE: Eigen-Reparameterized Mixture-of-Experts for Stable Routing and Interpretable Specialization
PositiveArtificial Intelligence
The article introduces ERMoE, a new Mixture-of-Experts (MoE) architecture designed to enhance model capacity by addressing challenges in routing and expert specialization. ERMoE reparameterizes experts in an orthonormal eigenbasis and utilizes an 'Eigenbasis Score' for routing, which stabilizes expert utilization and improves interpretability. This approach aims to overcome issues of misalignment and load imbalances that have hindered previous MoE architectures.
Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning
PositiveArtificial Intelligence
The article presents a new approach called Echoless Label-based Pre-computation (Echoless-LP) for enhancing the efficiency of Heterogeneous Graph Neural Networks (HGNNs). Traditional HGNNs rely on repetitive message passing during training, which hampers their performance on large-scale graphs. Echoless-LP addresses this issue by eliminating training label leakage, enabling more efficient mini-batch training and compatibility with advanced message passing methods.
NTSFormer: A Self-Teaching Graph Transformer for Multimodal Isolated Cold-Start Node Classification
PositiveArtificial Intelligence
The paper titled 'NTSFormer: A Self-Teaching Graph Transformer for Multimodal Isolated Cold-Start Node Classification' addresses the challenges of classifying isolated cold-start nodes in multimodal graphs, which often lack edges and modalities. The proposed Neighbor-to-Self Graph Transformer (NTSFormer) employs a self-teaching paradigm to enhance model capacity by using a cold-start attention mask for dual predictions—one based on the node's own features and another guided by a teacher model. This approach aims to improve classification accuracy in scenarios where traditional methods fall sho…