Vision Large Language Models Are Good Noise Handlers in Engagement Analysis

arXiv — cs.CVWednesday, November 19, 2025 at 5:00:00 AM
  • A new framework leveraging Vision Large Language Models (VLMs) has been proposed to improve engagement recognition in video datasets by refining subjective labels and managing noise. This framework categorizes data into reliable subsets and employs a training strategy that incorporates ambiguous samples gradually.
  • The development signifies a notable advancement in the field of AI, particularly in enhancing model performance for engagement analysis. By addressing label subjectivity, this approach could lead to more accurate and reliable engagement recognition, benefiting various applications in video analysis and beyond.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about