VideoChain: A Transformer-Based Framework for Multi-hop Video Question Generation

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
VideoChain represents a breakthrough in the field of video question generation by enabling Multi-hop Video Question Generation (MVQG), which allows for the creation of questions that necessitate reasoning across various segments of a video. This advancement is crucial as it moves beyond the limitations of existing frameworks that only address zero-hop questions tied to single segments. By utilizing a modified BART backbone and integrating video embeddings, VideoChain effectively captures both textual and visual dependencies. The construction of the MVQ-60 dataset from the TVQA+ dataset further enhances the framework's scalability and diversity. Evaluation results highlight VideoChain's robust performance across several standard generation metrics, including ROUGE-L, ROUGE-1, BLEU-1, BERTScore-F1, and semantic similarity, showcasing its ability to produce coherent and contextually relevant questions. This innovation not only improves the interaction with video content but also has impli…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it