arXiv:2511.07479v1 Announce Type: new 
Abstract: Conventional image sensors have limited dynamic range, causing saturation in high-dynamic-range (HDR) scenes. Modulo cameras address this by folding incident irradiance into a bounded range, yet require specialized unwrapping algorithms to reconstruct the underlying signal. Unlike HDR recovery, which extends dynamic range from conventional sampling, modulo recovery restores actual values from folded samples. Despite being introduced over a decade ago, progress in modulo image recovery has been slow, especially in the use of modern deep learning techniques. In this work, we demonstrate that standard HDR methods are unsuitable for modulo recovery. Transformers, however, can capture global dependencies and spatial-temporal relationships crucial for resolving folded video frames. Still, adapting existing Transformer architectures for modulo recovery demands novel techniques. To this end, we present Selective Spatiotemporal Vision Transformer (SSViT), the first deep learning framework for modulo video reconstruction. SSViT employs a token selection strategy to improve efficiency and concentrate on the most critical regions. Experiments confirm that SSViT produces high-quality reconstructions from 8-bit folded videos and achieves state-of-the-art performance in modulo video recovery.

تقدم دراسة جديدة الـ Selective Spatiotemporal Vision Transformer (SSViT)، وهو الإطار الأول للتعلم العميق لإعادة بناء الفيديو المودول، والذي يعالج قيود الطرق التقليدية للـ HDR في استعادة إطارات الفيديو المطوية. هذه الخطوة مهمة لأنها تعزز كفاءة عمليات استعادة الفيديو، وهو أمر حاسم للتطبيقات في التصوير عالي النطاق الديناميكي.

Un nuevo estudio presenta el Selective Spatiotemporal Vision Transformer (SSViT), el primer marco de aprendizaje profundo para la reconstrucción de video modulo, que aborda las limitaciones de los métodos HDR tradicionales en la recuperación de cuadros de video plegados. Este avance es significativo ya que mejora la eficiencia de los procesos de recuperación de video, crucial para aplicaciones en imágenes de alto rango dinámico.

Une nouvelle étude présente le Selective Spatiotemporal Vision Transformer (SSViT), le premier cadre d'apprentissage profond pour la reconstruction vidéo modulo, qui répond aux limites des méthodes HDR traditionnelles dans la récupération des images vidéo repliées. Cette avancée est significative car elle améliore l'efficacité des processus de récupération vidéo, cruciale pour les applications en imagerie à large gamme dynamique.

A new study introduces the Selective Spatiotemporal Vision Transformer (SSViT), the first deep learning framework for modulo video reconstruction, addressing the limitations of traditional HDR methods in recovering folded video frames. This advancement is significant as it enhances the efficiency of video recovery processes, which is crucial for applications in high-dynamic-range imaging.

Modulo Video Recovery via Selective Spatiotemporal Vision Transformer

Was this article worth reading? Share it

VSDECO

FiltrixAI

Novaheadshot

LexiStock AI

X Headshot

Attentive AI

Ready to build your own newsroom?