Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
The newly proposed Text-based Explainable Video Anomaly Detection (TbVAD) framework leverages language-driven techniques to enhance video anomaly detection, moving away from traditional models that depend on visual features. TbVAD operates in three stages: it first transforms video content into detailed captions using a vision-language model, then organizes these captions into four semantic slots—action, object, context, and environment—creating a structured knowledge base. Finally, it generates explanations that clarify which semantic factors influence anomaly detection. Evaluated on the UCF-Crime and XD-Violence benchmarks, TbVAD demonstrates that textual knowledge reasoning can provide reliable and interpretable results, crucial for effective surveillance applications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling
PositiveArtificial Intelligence
The paper presents a Short-Window Sliding Learning framework designed for real-time violence detection in CCTV footage. This innovative approach segments videos into 1-2 second clips, utilizing Large Language Model (LLM)-based auto-captioning to create detailed datasets. The method achieves a remarkable 95.25% accuracy on the RWF-2000 dataset and improves performance on longer videos, confirming its effectiveness and applicability in intelligent surveillance systems.