Unsupervised Discovery of Long-Term Spatiotemporal Periodic Workflows in Human Activities

arXiv — cs.CV•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new benchmark has been introduced to facilitate the unsupervised discovery of long
This development is significant as it provides researchers and practitioners with tools to better understand and model human activities across various domains, including manufacturing and sports.
The integration of large language models (LLMs) and advanced methodologies in related studies underscores the growing importance of sophisticated data analysis techniques in enhancing human activity recognition and prediction.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV5 hours ago

StreamingTalker: Audio-driven 3D Facial Animation with Autoregressive Diffusion Model

PositiveArtificial Intelligence

The paper presents StreamingTalker, an autoregressive diffusion model designed for speech-driven 3D facial animation. This model addresses the limitations of previous methods that process audio sequences in a single pass, which can lead to poor performance with longer inputs and increased latency. By processing audio in a streaming manner, StreamingTalker offers flexibility with varying audio lengths and reduces latency, enhancing the realism and synchronization of facial animations.

Read full article

via arXiv — cs.CV

arXiv — cs.CL5 hours ago

ProRAC: A Neuro-symbolic Method for Reasoning about Actions with LLM-based Progression

PositiveArtificial Intelligence

ProRAC (Progression-based Reasoning about Actions and Change) is a neuro-symbolic framework that utilizes large language models (LLMs) to address reasoning about actions and changes (RAC) problems. The framework extracts essential elements from RAC problems, executes actions progressively to determine the final state, and evaluates queries against this state. Evaluations on various RAC benchmarks indicate that ProRAC demonstrates strong performance across diverse tasks and domains.

Read full article

via arXiv — cs.CL

arXiv — cs.CV5 hours ago

Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving

NeutralArtificial Intelligence

The paper presents a novel physical adversarial attack targeting stereo matching models used in autonomous driving. Unlike traditional attacks that utilize 2D patches, this approach employs a 3D physical adversarial example (PAE) with global camouflage texture, enhancing visual consistency across various viewpoints. Additionally, a new 3D stereo matching rendering module is introduced to align the PAE with real-world positions in binocular vision, addressing the disparity effects of stereo cameras.

Read full article

via arXiv — cs.CV

arXiv — cs.CV5 hours ago

A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

PositiveArtificial Intelligence

The paper introduces a novel task called code-to-style image generation, which aims to create images with unique and consistent visual styles based solely on numerical style codes. This approach addresses challenges faced by existing generative methods that rely on extensive textual prompts or reference images. The authors present CoTyle, the first open-source method for this task, filling a gap in academic research on visual stylization, which has been largely dominated by industry players like Midjourney.

Read full article

via arXiv — cs.CV

arXiv — stat.ML5 hours ago

Near-optimal delta-convex estimation of Lipschitz functions

PositiveArtificial Intelligence

This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations, establishing an upper bound on its convergence rate. The approach extends max-affine methods from convex shape-restricted regression to a broader Lipschitz setting. A key component is a nonlinear feature expansion that maps max-affine functions into delta-convex functions, achieving the minimax convergence rate under squared loss and subgaussian distributions.

Read full article

via arXiv — stat.ML

arXiv — cs.CL5 hours ago

Mathematical Analysis of Hallucination Dynamics in Large Language Models: Uncertainty Quantification, Advanced Decoding, and Principled Mitigation

NeutralArtificial Intelligence

Large Language Models (LLMs) are advanced linguistic tools that can produce outputs that may sound plausible but are often factually incorrect, a phenomenon known as hallucination. This study introduces a mathematical framework to analyze, quantify, and mitigate these hallucinations. It employs probabilistic modeling and Bayesian uncertainty estimation to develop refined metrics and strategies, including contrastive decoding and retrieval-augmented grounding, aimed at enhancing the reliability of LLMs.

Read full article

via arXiv — cs.CL

arXiv — cs.CL5 hours ago

MedBench v4: A Robust and Scalable Benchmark for Evaluating Chinese Medical Language Models, Multimodal Models, and Intelligent Agents

PositiveArtificial Intelligence

MedBench v4 introduces a comprehensive benchmarking framework for evaluating Chinese medical language models, multimodal models, and intelligent agents. This cloud-based infrastructure features over 700,000 expert-curated tasks across various medical specialties. The evaluation process includes multi-stage refinement and clinician reviews, with results indicating that while base LLMs score an average of 54.1/100, safety and ethics ratings remain low at 18.4/100.

Read full article

via arXiv — cs.CL

arXiv — cs.CL5 hours ago

Retrieval Augmented Generation based context discovery for ASR

PositiveArtificial Intelligence

This research explores retrieval augmented generation as a method for automatic context discovery in context-aware Automatic Speech Recognition (ASR) systems, aiming to enhance transcription accuracy, especially with rare or out-of-vocabulary terms. The study introduces an embedding-based retrieval approach and evaluates its effectiveness against large language model alternatives. Experiments show a reduction in word error rate (WER) by up to 17% compared to no-context, with oracle context achieving a 24.1% reduction.

Read full article

via arXiv — cs.CL