arXiv:2511.07696v1 Announce Type: new 
Abstract: Dense and versatile image representations underpin the success of virtually all computer vision applications. However, state-of-the-art networks, such as transformers, produce low-resolution feature grids, which are suboptimal for dense prediction tasks. To address this limitation, we present FlowFeat, a high-resolution and multi-task feature representation. The key ingredient behind FlowFeat is a novel distillation technique that embeds a distribution of plausible apparent motions, or motion profiles. By leveraging optical flow networks and diverse video data, we develop an effective self-supervised training framework that statistically approximates the apparent motion. With its remarkable level of spatial detail, FlowFeat encodes a compelling degree of geometric and semantic cues while exhibiting high temporal consistency. Empirically, FlowFeat significantly enhances the representational power of five state-of-the-art encoders and alternative upsampling strategies across three dense tasks: video object segmentation, monocular depth estimation and semantic segmentation. Training FlowFeat is computationally inexpensive and robust to inaccurate flow estimation, remaining highly effective even when using unsupervised flow networks. Our work takes a step forward towards reliable and versatile dense image representations.

تم تقديم FlowFeat، وهو تمثيل متميز عالي الدقة ومتعدد المهام، لتحسين تطبيقات رؤية الكمبيوتر. يستخدم تقنية تقطير جديدة لدمج ملفات الحركة المحتملة، مما يحسن المهام مثل تقسيم كائنات الفيديو وتقسيم المعاني. هذه التطورات مهمة لأنها تعالج قيود الشبكات الحالية التي تنتج ميزات منخفضة الدقة، مما يعزز القدرة التمثيلية للترميز المتطور.

FlowFeat, una nueva representación de características de alta resolución y multitarea, se presentó para mejorar las aplicaciones de visión por computadora. Utiliza una técnica de destilación novedosa para incrustar perfiles de movimiento plausibles, mejorando tareas como la segmentación de objetos en video y la segmentación semántica. Este avance es significativo ya que aborda las limitaciones de las redes existentes que producen características de baja resolución, mejorando así el poder representativo de los codificadores de última generación.

FlowFeat, une nouvelle représentation de caractéristiques haute résolution et multitâche, a été introduite pour améliorer les applications de vision par ordinateur. Elle utilise une technique de distillation novatrice pour intégrer des profils de mouvement plausibles, améliorant des tâches comme la segmentation d'objets vidéo et la segmentation sémantique. Cette avancée est significative car elle répond aux limitations des réseaux existants qui produisent des caractéristiques de basse résolution, améliorant ainsi la puissance représentative des encodeurs à la pointe de la technologie.

FlowFeat, a new high-resolution and multi-task feature representation, was introduced to enhance computer vision applications. It utilizes a novel distillation technique to embed plausible motion profiles, improving tasks like video object segmentation and semantic segmentation. This advancement is significant as it addresses the limitations of existing networks that produce low-resolution features, thereby enhancing the representational power of state-of-the-art encoders.

FlowFeat: Pixel-Dense Embedding of Motion Profiles

Was this article worth reading? Share it

Deptho.ai

Summiflow

Attentive AI

Metaflow AI

ClipCutAi

SuperMotion

Ready to build your own newsroom?