Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models
PositiveArtificial Intelligence
The introduction of the Dual-Teacher Knowledge Distillation framework marks a significant advancement in the field of video action recognition, particularly for lightweight CNNs that traditionally struggle with accuracy compared to their heavier counterparts, Vision Transformers (ViTs). While ViTs have demonstrated strong performance, their high computational costs limit their practical use. The proposed framework effectively bridges this gap by leveraging both a heterogeneous ViT teacher and a homogeneous CNN teacher, allowing for a more robust transfer of knowledge. Key innovations such as Discrepancy-Aware Teacher Weighting dynamically adjust the influence of each teacher based on their confidence levels, while Structure Discrepancy-Aware Distillation focuses on teaching the student model the residual features between the two teacher architectures. Extensive experiments on datasets like HMDB51, EPIC-KITCHENS-100, and Kinetics-400 have shown that this method consistently outperforms …
— via World Pulse Now AI Editorial System
