PAS: A Training-Free Stabilizer for Temporal Encoding in Video LLMs
PositiveArtificial Intelligence
Video LLMs face challenges with temporal inconsistency, where minor shifts in frame timing can disrupt attention and obscure relevant frames. This instability is linked to the extension of Rotary Position Embeddings (RoPE) to video formats. The proposed solution, Phase Aggregated Smoothing (PAS), is a training-free method that applies small opposed phase offsets across heads and aggregates their outputs. PAS maintains the per-head spectrum magnitude while smoothing the temporal kernel, thereby reducing phase sensitivity without altering the positional encoding structure.
— via World Pulse Now AI Editorial System