Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The introduction of the Dual-Teacher Knowledge Distillation framework marks a significant advancement in the field of video action recognition, particularly for lightweight CNNs that traditionally struggle with accuracy compared to their heavier counterparts, Vision Transformers (ViTs). While ViTs have demonstrated strong performance, their high computational costs limit their practical use. The proposed framework effectively bridges this gap by leveraging both a heterogeneous ViT teacher and a homogeneous CNN teacher, allowing for a more robust transfer of knowledge. Key innovations such as Discrepancy-Aware Teacher Weighting dynamically adjust the influence of each teacher based on their confidence levels, while Structure Discrepancy-Aware Distillation focuses on teaching the student model the residual features between the two teacher architectures. Extensive experiments on datasets like HMDB51, EPIC-KITCHENS-100, and Kinetics-400 have shown that this method consistently outperforms …
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about