X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding
NeutralArtificial Intelligence
X-LeBench has been introduced to fill a critical gap in the evaluation of long egocentric video recordings, which existing benchmarks have largely overlooked by focusing on shorter durations. This new dataset comprises 432 simulated videos, with durations ranging from 23 minutes to 16.4 hours, generated through a life-logging simulation pipeline that integrates synthetic daily plans with real-world footage from the extensive Ego4D dataset. The potential applications of this benchmark are vast, particularly in fields like embodied intelligence and personalized assistive technologies, where understanding long-term human behaviors is essential. However, challenges remain in effectively analyzing these videos, including issues related to temporal localization, reasoning, context aggregation, and memory retention. Initial evaluations indicate that baseline systems and multimodal large language models (MLLMs) struggle with performance in this domain, highlighting the need for further researc…
— via World Pulse Now AI Editorial System
