DeLightMono: Enhancing Self-Supervised Monocular Depth Estimation in Endoscopy by Decoupling Uneven Illumination

arXiv — cs.CV•Wednesday, November 26, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new framework called DeLight-Mono has been introduced to enhance self-supervised monocular depth estimation in endoscopy by addressing the challenges posed by uneven illumination in endoscopic images. This innovative approach utilizes an illumination-reflectance-depth model and auxiliary networks to improve depth estimation accuracy, particularly in low-light conditions.
The development of DeLight-Mono is significant as it aims to improve the reliability of endoscopic navigation systems, which are crucial for medical procedures. By effectively decoupling illumination effects, this framework could lead to better surgical outcomes and enhanced patient safety.
This advancement reflects a broader trend in artificial intelligence where addressing environmental challenges, such as lighting conditions, is essential for improving depth estimation across various applications, including autonomous driving and robotics. The ongoing research in this field highlights the need for robust solutions that can operate effectively under diverse conditions, further emphasizing the importance of innovative frameworks like DeLight-Mono.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CVa day ago

4DWorldBench: A Comprehensive Evaluation Framework for 3D/4D World Generation Models

PositiveArtificial Intelligence

The introduction of 4DWorldBench marks a significant advancement in the evaluation of 3D/4D World Generation Models, which are crucial for developing realistic and dynamic environments for applications like virtual reality and autonomous driving. This framework assesses models based on perceptual quality, physical realism, and 4D consistency, addressing the need for a unified benchmark in a rapidly evolving field.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

PositiveArtificial Intelligence

A new model named Reasoning-VLA has been introduced, enhancing Vision-Language-Action (VLA) capabilities for autonomous driving. This model aims to improve decision-making efficiency and generalization across diverse driving scenarios by utilizing learnable action queries and a standardized dataset format for training.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Towards Trustworthy Wi-Fi Sensing: Systematic Evaluation of Deep Learning Model Robustness to Adversarial Attacks

NeutralArtificial Intelligence

A systematic evaluation of deep learning model robustness to adversarial attacks has been conducted, focusing on Channel State Information (CSI)-based human sensing systems. This research highlights the critical need for quantifying model robustness to ensure accurate predictions in real-world applications, such as device-free activity recognition and identity detection.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Unified Low-Light Traffic Image Enhancement via Multi-Stage Illumination Recovery and Adaptive Noise Suppression

PositiveArtificial Intelligence

A new study presents a fully unsupervised multi-stage deep learning framework aimed at enhancing low-light traffic images, addressing challenges such as poor visibility, noise, and motion blur that affect autonomous driving and urban surveillance. The model employs three specialized modules: Illumination Adaptation, Reflectance Restoration, and Over-Exposure Compensation to improve image quality.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

SupLID: Geometrical Guidance for Out-of-Distribution Detection in Semantic Segmentation

PositiveArtificial Intelligence

A novel framework named SupLID has been introduced to enhance Out-of-Distribution (OOD) detection in semantic segmentation, focusing on pixel-level anomaly localization. This advancement moves beyond traditional image-level techniques, utilizing Linear Intrinsic Dimensionality (LID) to guide classifier-derived OOD scores effectively.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

MonoSR: Open-Vocabulary Spatial Reasoning from Monocular Images

PositiveArtificial Intelligence

MonoSR has been introduced as a large-scale monocular spatial reasoning dataset, addressing the need for effective spatial reasoning from 2D images across various environments, including indoor, outdoor, and object-centric scenarios. This dataset supports multiple question types, paving the way for advancements in embodied AI and autonomous driving applications.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Monocular Person Localization under Camera Ego-motion

PositiveArtificial Intelligence

A new method for monocular person localization under camera ego-motion has been developed, addressing the challenges of accurately estimating a person's 3D position from 2D images captured by a moving camera. This approach utilizes a four-point model to jointly estimate the camera's 2D attitude and the person's 3D location, significantly improving localization accuracy compared to existing methods.

Read full article

via arXiv — cs.CV