AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization

arXiv — cs.CVMonday, November 17, 2025 at 5:00:00 AM
- The study introduces AccKV, a framework aimed at optimizing the inference of Audio-Video Large Language Models (AV-LLMs) through adaptive focusing and cross-calibration of key-value caches. This development is significant as it addresses the challenges posed by the larger KV cache size in AV-LLMs, which can affect performance if not managed properly. By focusing on the distinct characteristics of audio and video modalities, AccKV aims to enhance the efficiency and effectiveness of AV-LLMs in various applications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it