arXiv:2511.16993v1 Announce Type: new 
Abstract: Depth in the real world is rarely singular. Transmissive materials create layered ambiguities that confound conventional perception systems. Existing models remain passive, attempting to estimate static depth maps anchored to the nearest surface, while humans actively shift focus to perceive a desired depth. We introduce DepthFocus, a steerable Vision Transformer that redefines stereo depth estimation as intent-driven control. Conditioned on a scalar depth preference, the model dynamically adapts its computation to focus on the intended depth, enabling selective perception within complex scenes. The training primarily leverages our newly constructed 500k multi-layered synthetic dataset, designed to capture diverse see-through effects. DepthFocus not only achieves state-of-the-art performance on conventional single-depth benchmarks like BOOSTER, a dataset notably rich in transparent and reflective objects, but also quantitatively demonstrates intent-aligned estimation on our newly proposed real and synthetic multi-depth datasets. Moreover, it exhibits strong generalization capabilities on unseen see-through scenes, underscoring its robustness as a significant step toward active and human-like 3D perception.

تم تقديم DepthFocus كتحويلة بصرية قابلة للتوجيه تعزز تقدير العمق الاستريو من خلال السماح بالتحكم المدفوع بالنية في إدراك العمق في المشاهد المعقدة. يتناول هذا النموذج قيود الأنظمة الحالية التي تعتمد على خرائط العمق الثابتة، خاصة في البيئات التي تحتوي على مواد شفافة تخلق غموضًا متعدد الطبقات.

DepthFocus se ha presentado como un transformador de visión controlable que mejora la estimación de profundidad estéreo al permitir un control impulsado por la intención sobre la percepción de profundidad en escenas complejas. Este modelo aborda las limitaciones de los sistemas existentes que dependen de mapas de profundidad estáticos, especialmente en entornos con materiales transmisivos que crean ambigüedades en capas.

DepthFocus a été présenté comme un transformateur de vision orientable qui améliore l'estimation de la profondeur stéréo en permettant un contrôle basé sur l'intention de la perception de la profondeur dans des scènes complexes. Ce modèle répond aux limitations des systèmes existants qui s'appuient sur des cartes de profondeur statiques, en particulier dans des environnements avec des matériaux transmissifs qui créent des ambiguïtés en couches.

DepthFocus has been introduced as a steerable Vision Transformer that enhances stereo depth estimation by allowing intent-driven control over depth perception in complex scenes. This model addresses the limitations of existing systems that rely on static depth maps, particularly in environments with transmissive materials that create layered ambiguities.

DepthFocus: Controllable Depth Estimation for See-Through Scenes

arXiv:2511.07827v2 Announce Type: replace-cross 
Abstract: The proposed study aimed to develop a deep learning model capable of detecting ventriculomegaly on prenatal ultrasound images. Ventriculomegaly is a prenatal condition characterized by dilated cerebral ventricles of the fetal brain and is important to diagnose early, as it can be associated with an increased risk for fetal aneuploidies and/or underlying genetic syndromes. An Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), recently developed by our group, was fine-tuned for a binary classification task to distinguish fetal brain ultrasound images as either normal or showing ventriculomegaly. The USF-MAE incorporates a Vision Transformer encoder pretrained on more than 370,000 ultrasound images from the OpenUS-46 corpus. For this study, the pretrained encoder was adapted and fine-tuned on a curated dataset of fetal brain ultrasound images to optimize its performance for ventriculomegaly detection. Model evaluation was conducted using 5-fold cross-validation and an independent test cohort, and performance was quantified using accuracy, precision, recall, specificity, F1-score, and area under the receiver operating characteristic curve (AUC). The proposed USF-MAE model reached an F1-score of 91.76% on the 5-fold cross-validation and 91.78% on the independent test set, with much higher scores than those obtained by the baseline models by 19.37% and 16.15% compared to VGG-19, 2.31% and 2.56% compared to ResNet-50, and 5.03% and 11.93% compared to ViT-B/16, respectively. The model also showed a high mean test precision of 94.47% and an accuracy of 97.24%. The Eigen-CAM (Eigen Class Activation Map) heatmaps showed that the model was focusing on the ventricle area for the diagnosis of ventriculomegaly, which has explainability and clinical plausibility.

أظهرت دراسة حديثة تطوير نموذج تعلم عميق يهدف إلى الكشف عن حالة توسع البطينين في صور الموجات فوق الصوتية قبل الولادة. تُعتبر هذه الحالة، التي تتميز بتوسع البطينين الدماغيين في الجنين، مهمة للتشخيص المبكر نظرًا لارتباطها بزيادة خطر حدوث تشوهات صبغية جنينية ومتلازمات وراثية. تم ضبط النموذج، المعروف باسم نموذج الأساس الذاتي المراقب بالموجات فوق الصوتية مع تشفير تلقائي مقنع (USF-MAE)، لمهمة تصنيف ثنائية لصور الموجات فوق الصوتية للدماغ الجنيني.

Un estudio reciente ha desarrollado un modelo de aprendizaje profundo destinado a detectar la ventriculomegalia en imágenes de ultrasonido prenatal. Esta condición, caracterizada por ventrículos cerebrales dilatados en el cerebro fetal, es crucial de diagnosticar tempranamente debido a su asociación con un mayor riesgo de aneuploidías fetales y síndromes genéticos. El modelo, conocido como Modelo de Fundación Auto-Supervisado de Ultrasonido con Auto-Codificación Enmascarada (USF-MAE), fue ajustado para la clasificación binaria de imágenes de ultrasonido cerebral fetal.

Une étude récente a développé un modèle d'apprentissage profond visant à détecter la ventriculomégalie dans les images d'échographie prénatale. Cette condition, caractérisée par des ventricules cérébraux dilatés dans le cerveau fœtal, est cruciale à diagnostiquer tôt en raison de son association avec des risques accrus d'aneuploïdies fœtales et de syndromes génétiques. Le modèle, connu sous le nom de Modèle de Fondation Auto-Supervisé en Échographie avec Auto-Encodage Masqué (USF-MAE), a été affiné pour la classification binaire des images d'échographie cérébrale fœtale.

A recent study has developed a deep learning model aimed at detecting ventriculomegaly in prenatal ultrasound images. This condition, characterized by dilated cerebral ventricles in the fetal brain, is crucial to diagnose early due to its association with increased risks for fetal aneuploidies and genetic syndromes. The model, known as the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), was fine-tuned for binary classification of fetal brain ultrasound images.

Deep Learning Analysis of Prenatal Ultrasound for Identification of Ventriculomegaly

arXiv:2511.00456v3 Announce Type: replace 
Abstract: Chest X-ray imaging is commonly used to diagnose pneumonia, but accurately localizing the pneumonia affected regions typically requires detailed pixel-level annotations, which are costly and time consuming to obtain. To address this limitation, this study proposes a weakly supervised deep learning framework for pneumonia classification and localization using Gradient-weighted Class Activation Mapping (Grad-CAM). Instead of relying on costly pixel-level annotations, the proposed method utilizes image-level labels to generate clinically meaningful heatmaps that highlight pneumonia affected regions. Furthermore, we evaluate seven pre-trained deep learning models including a Vision Transformer under identical training conditions, using focal loss and patient-wise splits to prevent data leakage. Experimental results suggest that all models achieved high accuracy (96-98%), with ResNet-18 and EfficientNet-B0 showing the best overall performance and MobileNet-V2 providing an efficient lightweight alternative. Grad-CAM heatmap visualizations in this study confirm that the proposed methods focus on clinically relevant lung regions, supporting the use of explainable AI for radiological diagnostics. Overall, this work highlights the potential of weakly supervised, explainable models that enhance transparency and clinical trust in AI-assisted pneumonia screening.

قدمت دراسة حديثة إطارًا للتعلم العميق تحت الإشراف الضعيف لتصنيف وتحديد مواقع الالتهاب الرئوي من صور الأشعة السينية للصدر، باستخدام خريطة تنشيط الفئة الموزونة بالتدرج (Grad-CAM) لإنشاء خرائط حرارية تبرز المناطق المتأثرة دون الحاجة إلى تعليقات على مستوى البكسل باهظة الثمن. حقق الإطار معدلات دقة عالية تتراوح بين 96-98% عبر نماذج مسبقة التدريب مختلفة، بما في ذلك ResNet-18 و EfficientNet-B0.

Un estudio reciente ha presentado un marco de aprendizaje profundo débilmente supervisado para la clasificación y localización de la neumonía a partir de imágenes de radiografías de tórax, utilizando la Mapeo de Activación de Clase Ponderada por Gradiente (Grad-CAM) para generar mapas de calor que destacan las regiones afectadas sin necesidad de anotaciones a nivel de píxel costosas. El marco logró altas tasas de precisión del 96 al 98 % en varios modelos preentrenados, incluidos ResNet-18 y EfficientNet-B0.

Une étude récente a introduit un cadre d'apprentissage profond faiblement supervisé pour la classification et la localisation de la pneumonie à partir d'images de radiographies thoraciques, utilisant la cartographie d'activation de classe pondérée par le gradient (Grad-CAM) pour générer des cartes thermiques mettant en évidence les régions affectées sans avoir besoin d'annotations au niveau des pixels coûteuses. Le cadre a atteint des taux de précision élevés de 96 à 98 % sur divers modèles pré-entraînés, y compris ResNet-18 et EfficientNet-B0.

A recent study has introduced a weakly supervised deep learning framework for pneumonia classification and localization from chest X-ray images, utilizing Gradient-weighted Class Activation Mapping (Grad-CAM) to generate heatmaps that highlight affected regions without the need for costly pixel-level annotations. The framework achieved high accuracy rates of 96-98% across various pre-trained models, including ResNet-18 and EfficientNet-B0.

DepthFocus: Controllable Depth Estimation for See-Through Scenes

Was this article worth reading? Share it