arXiv:2511.18504v1 Announce Type: new 
Abstract: The demand for edge AI in vision-language tasks requires models that achieve real-time performance on resource-constrained devices with limited power and memory. This paper proposes two adaptive compression techniques -- Sparse Temporal Token Fusion (STTF) and Adaptive Neural Compression (ANC) -- that integrate algorithmic innovations with hardware-aware optimizations. Unlike previous approaches relying on static pruning or uniform scaling, STTF dynamically reuses visual tokens through event-driven change detection, while ANC conditionally activates encoder branches via a learned router, enabling fine-grained adaptation to scene complexity. Our 3B-parameter TinyGPT-STTF achieves CIDEr 131.2, BLEU-4 0.38, METEOR 0.31, and ROUGE-L 0.56 on the COCO 2017 test set, surpassing LLaVA-1.5 7B by 17.6 CIDEr points while using 2.3x fewer parameters and 62x fewer on-device FLOPs. TinyGPT-ANC reaches CIDEr 128.5. On event-based vision tasks, STTF reduces average token count by 84% (from 196 to 31 tokens) while preserving 95.6% accuracy on the DVS128 Gesture dataset, and ANC cuts FLOPs by up to 90% in low-motion scenes. Compared to strong baselines, our models improve accuracy by up to 4.4% and reduce latency by up to 13x. These results enable efficient deployment of capable vision-language models on real-world edge devices.

تقدم دراسة جديدة تقنيتين مبتكرتين للضغط، وهما دمج الرموز الزمنية النادرة (STTF) وضغط الشبكات العصبية التكيفية (ANC)، تهدفان إلى تحسين أداء الذكاء الاصطناعي على الحافة في مهام الرؤية واللغة. تتيح هذه الطرق للنماذج العمل بكفاءة على الأجهزة ذات الموارد المحدودة، محققة تحسينات كبيرة في مقاييس الأداء في الوقت الفعلي مقارنة بالنماذج الحالية مثل LLaVA-1.5.

Un nuevo estudio presenta dos técnicas de compresión innovadoras, la Fusión de Tokens Temporales Escasos (STTF) y la Compresión Neural Adaptativa (ANC), destinadas a mejorar el rendimiento de la IA en el borde en tareas de visión-lenguaje. Estos métodos permiten que los modelos funcionen de manera eficiente en dispositivos con recursos limitados, logrando mejoras significativas en las métricas de rendimiento en tiempo real en comparación con modelos existentes como LLaVA-1.5.

Une nouvelle étude présente deux techniques de compression innovantes, la Fusion de Tokens Temporels Épars (STTF) et la Compression Neuronale Adaptative (ANC), visant à améliorer les performances de l'IA en périphérie dans les tâches de vision-langage. Ces méthodes permettent aux modèles de fonctionner efficacement sur des appareils aux ressources limitées, atteignant des améliorations significatives des métriques de performance en temps réel par rapport à des modèles existants comme LLaVA-1.5.

A new study introduces two innovative compression techniques, Sparse Temporal Token Fusion (STTF) and Adaptive Neural Compression (ANC), aimed at enhancing edge AI performance in vision-language tasks. These methods allow models to operate efficiently on devices with limited resources, achieving significant improvements in real-time performance metrics compared to existing models like LLaVA-1.5.

Extreme Model Compression for Edge Vision-Language Models: Sparse Temporal Token Fusion and Adaptive Neural Compression

Was this article worth reading? Share it

Blunge

Brainactive

Attentive AI