Adjacent-view Transformers for Supervised Surround-view Depth Estimation

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The recent paper titled 'Adjacent-view Transformers for Supervised Surround-view Depth Estimation' introduces a novel approach to depth estimation, a critical component for 3D perception in robotics and autonomous driving. Traditional methods have primarily relied on front-view cameras, particularly within the KITTI benchmark, which limits their effectiveness. The AVT-SSDepth method proposed in this paper leverages a global-to-local feature extraction module that integrates CNN and transformer layers, allowing for richer representations. Furthermore, it introduces an adjacent-view attention mechanism that facilitates both intra-view and inter-view feature propagation, enhancing the depth estimation process across multiple cameras. Extensive experiments validate its superior performance over existing state-of-the-art methods on datasets like DDAD and nuScenes, showcasing its strong cross-dataset generalization. This advancement not only addresses previous research limitations but also h…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about