Knowledge-Guided Textual Reasoning for Explainable Video Anomaly Detection via LLMs
PositiveArtificial Intelligence
The newly proposed Text-based Explainable Video Anomaly Detection (TbVAD) framework leverages language-driven techniques to enhance video anomaly detection, moving away from traditional models that depend on visual features. TbVAD operates in three stages: it first transforms video content into detailed captions using a vision-language model, then organizes these captions into four semantic slots—action, object, context, and environment—creating a structured knowledge base. Finally, it generates explanations that clarify which semantic factors influence anomaly detection. Evaluated on the UCF-Crime and XD-Violence benchmarks, TbVAD demonstrates that textual knowledge reasoning can provide reliable and interpretable results, crucial for effective surveillance applications.
— via World Pulse Now AI Editorial System
