Vision Transformer Based User Equipment Positioning

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
Recent advancements in deep learning have highlighted the challenges in User Equipment (UE) positioning, particularly with models that apply uniform attention across inputs and struggle with non-sequential data. In response, researchers have proposed a Vision Transformer (ViT) architecture that specifically targets the Angle Delay Profile derived from Channel State Information (CSI). This innovative approach has been validated using the DeepMIMO and ViWi ray-tracing datasets, demonstrating impressive results with a Root Mean Squared Error (RMSE) of 0.55m indoors and 13.59m outdoors in DeepMIMO, along with 3.45m in ViWi's outdoor blockage scenario. Notably, this method surpasses existing state-of-the-art schemes by approximately 38%, marking a significant leap in positioning accuracy. The implications of this research are profound, as enhanced UE positioning can lead to improved performance in various applications, including telecommunications and smart technologies.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it