arXiv:2501.06835v2 Announce Type: replace 
Abstract: Long-form egocentric video understanding provides rich contextual information and unique insights into long-term human behaviors, holding significant potential for applications in embodied intelligence, long-term activity analysis, and personalized assistive technologies. However, existing benchmark datasets primarily focus on single, short (\eg, minutes to tens of minutes) to moderately long videos, leaving a substantial gap in evaluating extensive, ultra-long egocentric video recordings. To address this, we introduce X-LeBench, a novel benchmark dataset meticulously designed to fill this gap by focusing on tasks requiring a comprehensive understanding of extremely long egocentric video recordings. Our X-LeBench develops a life-logging simulation pipeline that produces realistic, coherent daily plans aligned with real-world video data. This approach enables the flexible integration of synthetic daily plans with real-world footage from Ego4D-a massive-scale egocentric video dataset covers a wide range of daily life scenarios-resulting in 432 simulated video life logs spanning from 23 minutes to 16.4 hours. The evaluations of several baseline systems and multimodal large language models (MLLMs) reveal their poor performance across the board, highlighting the inherent challenges of long-form egocentric video understanding, such as temporal localization and reasoning, context aggregation, and memory retention, and underscoring the need for more advanced models.

تُعدّ مقدمة X-LeBench تقدمًا كبيرًا في فهم الفيديوهات الذاتية، حيث تعالج القيود التي تواجهها المعايير الحالية التي تركز على الفيديوهات القصيرة. يتضمن هذا المعيار الجديد 432 فيديو مُحاكى تتراوح مدتها بين 23 دقيقة و16.4 ساعة، مما يوفر موردًا شاملاً لتحليل السلوكيات البشرية على المدى الطويل، وهو أمر حاسم للتطبيقات في الذكاء المجسد والتقنيات المساعدة المخصصة.

La introducción de X-LeBench representa un avance significativo en la comprensión de videos egocéntricos, abordando las limitaciones de los benchmarks existentes que se centran en videos más cortos. Este nuevo conjunto de datos incluye 432 videos simulados que van de 23 minutos a 16.4 horas, proporcionando un recurso integral para analizar comportamientos humanos a largo plazo, lo cual es crucial para aplicaciones en inteligencia encarnada y tecnologías de asistencia personalizadas.

L'introduction de X-LeBench représente une avancée significative dans la compréhension des vidéos égocentriques, en répondant aux limitations des benchmarks existants qui se concentrent sur des vidéos plus courtes. Ce nouvel ensemble de données comprend 432 vidéos simulées allant de 23 minutes à 16,4 heures, fournissant une ressource complète pour analyser les comportements humains à long terme, ce qui est crucial pour des applications en intelligence incarnée et technologies d'assistance personnalisées.

The introduction of X-LeBench marks a significant advancement in egocentric video understanding, addressing the limitations of existing benchmarks that focus on shorter videos. This new dataset includes 432 simulated videos ranging from 23 minutes to 16.4 hours, providing a comprehensive resource for analyzing long-term human behaviors, which is crucial for applications in embodied intelligence and personalized assistive technologies.

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

<A HREF="https://www.businessinsider.com/meta-vibes-ai-internal-documents-show-daily-active-users-2025-11"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251118/i40.jpg"></A>
<A HREF="http://www.techmeme.com/251118/p40#a251118p40" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Pranav Dixit / <A HREF="https://www.businessinsider.com/">Business Insider</A>: 
<A HREF="https://www.businessinsider.com/meta-vibes-ai-internal-documents-show-daily-active-users-2025-11">Internal docs show Meta's Vibes AI video feed has about 2M DAUs as of November 9, up 1% from the previous week; ~52% of returning users prompted the AI</A>&nbsp; &mdash;&nbsp; - Internal documents show how Meta's new Vibes AI video feed is performing across its biggest markets.

تكشف الوثائق الداخلية أن خدمة الفيديو Vibes AI التابعة لشركة ميتا تضم حوالي 2 مليون مستخدم نشط يوميًا اعتبارًا من 9 نوفمبر، بزيادة قدرها 1% عن الأسبوع السابق. وقد تفاعل حوالي 52% من المستخدمين العائدين مع ميزة الذكاء الاصطناعي. تسلط هذه البيانات الضوء على زيادة التفاعل مع المحتوى المدعوم بالذكاء الاصطناعي من ميتا، مما يشير إلى اتجاه إيجابي في تفاعل المستخدمين واحتفاظهم.

Documentos internos revelan que el feed de video Vibes AI de Meta tiene aproximadamente 2 millones de usuarios activos diarios (UAD) a partir del 9 de noviembre, lo que representa un aumento del 1% respecto a la semana anterior. Aproximadamente el 52% de los usuarios que regresan interactuaron con la función de IA. Estos datos de rendimiento destacan el creciente compromiso con el contenido impulsado por IA de Meta, indicando una tendencia positiva en la interacción y retención de usuarios.

Des documents internes révèlent que le flux vidéo Vibes AI de Meta compte environ 2 millions d'utilisateurs actifs quotidiens (UAQ) au 9 novembre, soit une augmentation de 1 % par rapport à la semaine précédente. Environ 52 % des utilisateurs revenants ont interagi avec la fonctionnalité AI. Ces données de performance mettent en lumière l'engagement croissant envers le contenu alimenté par l'IA de Meta, indiquant une tendance positive dans l'interaction et la fidélisation des utilisateurs.

Internal documents reveal that Meta's Vibes AI video feed has approximately 2 million daily active users (DAUs) as of November 9, reflecting a 1% increase from the previous week. Notably, around 52% of returning users interacted with the AI feature. This performance data highlights the growing engagement with Meta's AI-driven content, indicating a positive trend in user interaction and retention.

Internal docs show Meta's Vibes AI video feed has about 2M DAUs as of November 9, up 1% from the previous week; ~52% of returning users prompted the AI (Pranav Dixit/Business Insider)

GlobalFoundries acquires AMF to become top silicon photonics foundry.
The post <a href="https://www.eetimes.com/gf-targets-1-billion-silicon-photonics-revenue-with-amf-acquisition/">GF Targets $1 Billion Silicon Photonics Revenue with AMF Acquisition</a> appeared first on <a href="https://www.eetimes.com">EE Times</a>.

استحوذت شركة GlobalFoundries على AMF، مما يضعها في موقع رائد في سوق الفوتونيك السيليكون. تهدف هذه الصفقة إلى تحقيق إيرادات تصل إلى مليار دولار من الفوتونيك السيليكون، وهي تقنية تدمج المكونات الضوئية مع الدوائر السيليكونية، مما يعزز سرعة وكفاءة نقل البيانات. تعكس هذه الخطوة التزام GlobalFoundries بتوسيع قدراتها في التقنيات المتقدمة وتلبية الطلب المتزايد على حلول الحوسبة عالية الأداء.

GlobalFoundries ha adquirido AMF, posicionándose como un jugador líder en el mercado de la fotónica de silicio. Esta adquisición tiene como objetivo generar 1.000 millones de dólares en ingresos de fotónica de silicio, una tecnología que integra componentes ópticos con circuitos de silicio, mejorando la velocidad y eficiencia de la transmisión de datos. Este movimiento refleja el compromiso de GlobalFoundries de expandir sus capacidades en tecnologías avanzadas y satisfacer la creciente demanda de soluciones informáticas de alto rendimiento.

GlobalFoundries a acquis AMF, se positionnant comme un acteur majeur sur le marché de la photonique silicium. Cette acquisition vise à générer 1 milliard de dollars de revenus provenant de la photonique silicium, une technologie qui intègre des composants optiques avec des circuits en silicium, améliorant ainsi la vitesse et l'efficacité de la transmission des données. Ce mouvement reflète l'engagement de GlobalFoundries à élargir ses capacités dans les technologies avancées et à répondre à la demande croissante de solutions informatiques haute performance.

GlobalFoundries has acquired AMF, positioning itself as a leading player in the silicon photonics market. This acquisition aims to generate $1 billion in revenue from silicon photonics, a technology that integrates optical components with silicon circuits, enhancing data transmission speeds and efficiency. The move reflects GlobalFoundries' commitment to expanding its capabilities in advanced technologies and meeting the growing demand for high-performance computing solutions.

GF Targets $1 Billion Silicon Photonics Revenue with AMF Acquisition

Black Friday is just two weeks away, and in the lead-up to the sales event, I've collected the best early Chromebook deals across major retailers.

يقترب يوم الجمعة السوداء بعد أسبوعين، وقد بدأت العروض المبكرة على أجهزة Chromebook تظهر في المتاجر الكبرى. يبرز هذا المقال أكثر من 20 عرضًا مبكرًا، مما يوفر للمستهلكين فرصة للتوفير في مشترياتهم خلال العطلات.

El Black Friday se acerca en dos semanas, y ya han comenzado a aparecer ofertas anticipadas en Chromebooks en los principales minoristas. Este artículo destaca más de 20 ventas anticipadas, brindando a los consumidores la oportunidad de ahorrar en sus compras navideñas.

Le Black Friday approche dans deux semaines, et des offres précoces sur les Chromebooks commencent à apparaître chez les principaux détaillants. Cet article met en avant plus de 20 ventes anticipées, offrant aux consommateurs l'occasion d'économiser sur leurs achats de vacances.

Black Friday is approaching in two weeks, and early Chromebook deals have started to appear across major retailers. This article highlights over 20 early sales, providing consumers with an opportunity to save on their holiday shopping.

Best early Black Friday Chromebook deals 2025: 20+ sales out early

Buy now, pay later firm says pay has risen by 60% with staff numbers mostly cut by natural attrition and tech investmentKlarna has claimed that AI-related savings have allowed the buy now, pay later company to increase staff salaries by nearly 60%, but hinted it could slash more jobs after nearly halving its workforce over the past three years.Chief executive Sebastian Siemiatkowski said headcount had dropped from 5,527 to 2,907 since 2022, mostly as a result of natural attrition, with departing staff replaced by technology rather than by new staff members. <a href="https://www.theguardian.com/business/2025/nov/18/buy-now-pay-later-klarna-ai-helped-halve-staff-boost-pay">Continue reading...</a>

أعلنت شركة كلارنا، المتخصصة في الدفع لاحقًا، عن زيادة بنسبة 60% في رواتب موظفيها، مشيرةً إلى أن هذه الزيادة جاءت نتيجة التوفير الناتج عن الاستثمارات في الذكاء الاصطناعي. وقد خفضت الشركة عدد موظفيها من 5,527 إلى 2,907 خلال السنوات الثلاث الماضية، وذلك بشكل رئيسي من خلال الاستقالة الطبيعية، حيث تم استبدال الموظفين المغادرين بالتكنولوجيا بدلاً من توظيف موظفين جدد. وأشار الرئيس التنفيذي سيباستيان سييمياتكوفسكي إلى أنه قد يتم تقليص المزيد من الوظائف مع استمرار الشركة في الاستفادة من التكنولوجيا لتعزيز الكفاءة.

Klarna, una empresa de 'compra ahora, paga después', ha informado de un aumento del 60% en los salarios de su personal, atribuyendo este incremento a los ahorros generados por inversiones en inteligencia artificial. La compañía ha reducido su plantilla de 5,527 a 2,907 en los últimos tres años, principalmente por medio de la baja natural, con la tecnología reemplazando a los empleados que se van en lugar de contratar nuevos. El CEO Sebastian Siemiatkowski indicó que podrían producirse más despidos a medida que la empresa continúe aprovechando la tecnología para mejorar la eficiencia.

Klarna, une entreprise de paiement différé, a annoncé une augmentation de 60 % des salaires de ses employés, attribuant cette hausse aux économies réalisées grâce aux investissements en IA. L'entreprise a réduit son effectif de 5 527 à 2 907 au cours des trois dernières années, principalement par attrition naturelle, la technologie remplaçant les employés partants au lieu d'embaucher de nouveaux. Le PDG Sebastian Siemiatkowski a indiqué que d'autres suppressions d'emplois pourraient se produire alors que l'entreprise continue d'exploiter la technologie pour améliorer son efficacité.

Klarna, a buy now, pay later company, has reported a 60% increase in staff salaries, attributing this rise to savings generated from AI investments. The company has reduced its workforce from 5,527 to 2,907 over the past three years, primarily through natural attrition, with technology replacing departing employees instead of hiring new staff. CEO Sebastian Siemiatkowski indicated that further job cuts could occur as the company continues to leverage technology to enhance efficiency.

Klarna says AI drive has helped halve staff numbers and boost pay

Evaluating Generative AI: A Novel Metric - Perceptual Diversity

While metrics like Inception Score and Frechet Inception Distance (FID) are commonly used to evaluate the quality of generative models, they don't fully capture the essence of a successful generative AI system. Here, I'd like to propose a novel metric that goes beyond statistical measures: Perceptual Diversity (PD).

What is Perceptual Diversity?

Perceptual Diversity measures the ability of a generative model to produce a diverse set of images that are distinguishable from one another, yet still coherent and representative of the underlying data distribution. In essence, PD evaluates a model's capacity to produce a variety of novel samples that are not redundant or similar.

Example: Generative AI for Architectural Design

Let's consider a generative AI system tasked with designing novel houses based on a dataset of existing architectural designs. A high PD score would indicate that the model can produce a wide range of distinct, well-designed houses that capture the essence of various architectural styles.

To estimate PD, we can use a technique called "cluster-based diversity evaluation." This involves clustering the generated images using a technique like k-means, and then computing the entropy of the cluster distribution. The higher the entropy, the more diverse the generated samples.

Example Results

Using a Generative Adversarial Network (GAN) model trained on a dataset of 1000 architectural designs, we obtained the following results:

<ul>
<li>Average Inception Score: 5.2</li>
<li>Average FID Score: 10.5</li>
<li>Average Perceptual Diversity (PD): 0.85</li>
</ul>

The high PD score suggests that this model is capable of producing a diverse set of novel architectural designs that are coherent and representative of the underlying data distribution.

Conclusion

Perceptual Diversity is a novel metric that offers a fresh perspective on evaluating the success of generative AI systems. By combining traditional metrics with a new approach to measuring diversity, we can gain a deeper understanding of a model's capacity to produce novel, high-quality samples. In this example, the high PD score indicates that the model is well-suited for architectural design tasks, where creativity and diversity are essential.




Publicado automáticamente

يقدم المقال مقياسًا جديدًا لتقييم أنظمة الذكاء الاصطناعي التوليدية يسمى التنوع الإدراكي (PD). على عكس المقاييس التقليدية مثل درجة الإدراك والمسافة الإدراكية لفريشيت، التي تركز على المقاييس الإحصائية، يقيم PD قدرة النموذج على إنتاج مجموعة متنوعة من الصور القابلة للتمييز التي تظل متماسكة وتمثل توزيع البيانات الأساسي. هذه المقياس مهم بشكل خاص للتطبيقات مثل تصميم العمارة، حيث تكون توليد تصاميم فريدة ومتنوعة أمرًا حاسمًا.

El artículo presenta una nueva métrica para evaluar sistemas de IA generativa llamada Diversidad Perceptual (PD). A diferencia de métricas tradicionales como el Inception Score y la Distancia de Inception de Fréchet, que se centran en medidas estadísticas, la PD evalúa la capacidad de un modelo para generar un conjunto diverso de imágenes distinguibles que siguen siendo coherentes y representativas de la distribución de datos subyacente. Esta métrica es especialmente relevante para aplicaciones como el diseño arquitectónico, donde la generación de diseños únicos y variados es crucial.

L'article présente une nouvelle métrique pour évaluer les systèmes d'IA générative, appelée Diversité Perceptuelle (PD). Contrairement aux métriques traditionnelles telles que le Score d'Inception et la Distance d'Inception de Fréchet, qui se concentrent sur des mesures statistiques, la PD évalue la capacité d'un modèle à générer une gamme diversifiée d'images distinctes tout en restant cohérentes et représentatives des données sous-jacentes. Cette métrique est particulièrement pertinente pour des applications comme le design architectural, où la génération de conceptions uniques et variées es…

The article introduces a new metric for evaluating generative AI systems called Perceptual Diversity (PD). Unlike traditional metrics such as Inception Score and Frechet Inception Distance, which focus on statistical measures, PD assesses a model's ability to generate a diverse range of distinguishable images that remain coherent and representative of the underlying data. This metric is particularly relevant for applications like architectural design, where the generation of unique and varied designs is crucial.

**Evaluating Generative AI: A Novel Metric - Perceptual Dive

Black Friday is Nov. 28, but major retailers like Amazon, Best Buy, and Walmart are already slashing prices on top tech.

Best early Black Friday deals 2025: 55+ deals on TVs, laptops, streaming, and more

arXiv:2505.10769v2 Announce Type: replace 
Abstract: Accurate segmentation of regions of interest in biomedical images holds substantial value in image analysis. Although several foundation models for biomedical segmentation have currently achieved excellent performance on certain datasets, they typically demonstrate sub-optimal performance on unseen domain data. We owe the deficiency to lack of vision-language knowledge before segmentation. Multimodal Large Language Models (MLLMs) bring outstanding understanding and reasoning capabilities to multimodal tasks, which inspires us to leverage MLLMs to inject Vision-Language Knowledge (VLK), thereby enabling vision models to demonstrate superior generalization capabilities on cross-domain datasets. In this paper, we propose a novel framework that seamlessly uses MLLMs to guide SAM in learning microscopy cross-domain data, unifying Segment Anything in Microscopy, named uLLSAM. Specifically, we propose the Vision-Language Semantic Alignment (VLSA) module, which injects VLK into Segment Anything Model (SAM). We find that after SAM receives global VLK prompts, its performance improves significantly, but there are deficiencies in boundary contour perception. Therefore, we further propose Semantic Boundary Regularization (SBR) to regularize SAM. Our method achieves performance improvements of 11.8% in SA across 9 in-domain microscopy datasets, achieving state-of-the-art performance. Our method also demonstrates improvements of 9.2% in SA across 10 out-of-domain datasets, exhibiting strong generalization capabilities. Code is available at https://github.com/ieellee/uLLSAM.

تناقش الورقة المعنونة 'توحيد نموذج تقسيم أي شيء في المجهر باستخدام المعرفة بين الرؤية واللغة' أهمية التقسيم الدقيق في الصور الطبية الحيوية. تبرز الورقة قيود النماذج الحالية في التعامل مع بيانات المجالات غير المرئية بسبب نقص المعرفة بين الرؤية واللغة. يقترح المؤلفون إطارًا جديدًا، uLLSAM، الذي يستخدم نماذج اللغة متعددة الوسائط (MLLMs) لتعزيز أداء التقسيم. تهدف هذه الطريقة إلى تحسين قدرات التعميم عبر مجموعات البيانات بين المجالات.

El artículo titulado 'Unifying Segment Anything in Microscopy with Vision-Language Knowledge' discute la importancia de la segmentación precisa en imágenes biomédicas. Destaca las limitaciones de los modelos existentes para manejar datos de dominio no vistos debido a la falta de conocimiento visión-lenguaje. Los autores proponen un nuevo marco, uLLSAM, que utiliza Modelos de Lenguaje Multimodal (MLLMs) para mejorar el rendimiento de la segmentación. Este enfoque busca mejorar las capacidades de generalización en conjuntos de datos inter-dominio.

L'article intitulé 'Unifying Segment Anything in Microscopy with Vision-Language Knowledge' aborde l'importance de la segmentation précise dans les images biomédicales. Il met en lumière les limites des modèles existants pour traiter des données de domaine non vues en raison d'un manque de connaissance vision-langage. Les auteurs proposent un nouveau cadre, uLLSAM, qui utilise des modèles de langage multimodaux (MLLMs) pour améliorer la performance de segmentation. Cette approche vise à améliorer les capacités de généralisation sur des ensembles de données inter-domaines.

The paper titled 'Unifying Segment Anything in Microscopy with Vision-Language Knowledge' discusses the importance of accurate segmentation in biomedical images. It highlights the limitations of existing models in handling unseen domain data due to a lack of vision-language knowledge. The authors propose a new framework, uLLSAM, which utilizes Multimodal Large Language Models (MLLMs) to enhance segmentation performance. This approach aims to improve generalization capabilities across cross-domain datasets, achieving notable performance improvements.

Unifying Segment Anything in Microscopy with Vision-Language Knowledge

arXiv:2511.07947v2 Announce Type: replace-cross 
Abstract: Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries replicate their functionality through black-box queries. Model watermarking counters MEAs by embedding forensic markers for ownership verification. Current black-box watermarks prioritize MEA survival through representation entanglement, yet inadequately explore resilience against sequential MEAs and removal attacks. Our study reveals that this risk is underestimated because existing removal methods are weakened by entanglement. To address this gap, we propose Watermark Removal attacK (WRK), which circumvents entanglement constraints by exploiting decision boundaries shaped by prevailing sample-level watermark artifacts. WRK effectively reduces watermark success rates by at least 88.79% across existing watermarking benchmarks.
  For robust protection, we propose Class-Feature Watermarks (CFW), which improve resilience by leveraging class-level artifacts. CFW constructs a synthetic class using out-of-domain samples, eliminating vulnerable decision boundaries between original domain samples and their artifact-modified counterparts (watermark samples). CFW concurrently optimizes both MEA transferability and post-MEA stability. Experiments across multiple domains show that CFW consistently outperforms prior methods in resilience, maintaining a watermark success rate of at least 70.15% in extracted models even under the combined MEA and WRK distortion, while preserving the utility of protected models.

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

arXiv:2511.11688v1 Announce Type: new 
Abstract: Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.

يتناول المقال بعنوان 'تحسين الجدول الهرمي لعملية أخذ العينات السريعة والموثوقة لنماذج الانتشار' عملية أخذ العينات التكرارية البطيئة للنماذج الاحتمالية للانتشار، المعروفة بدقتها في التوليد. لتحسين سرعة أخذ العينات دون المساس بالجودة، يقدم المؤلفون مُحسِّن الجدول الهرمي (HSO)، وهو إطار تحسين ذو مستويين. يهدف HSO إلى تحسين توزيع الخطوات مع الالتزام بمبادئ الفعالية، والتكيف، والموثوقية، والكفاءة الحسابية، متجاوزًا القيود المفروضة على الأساليب الحالية.

El artículo titulado 'Optimización Jerárquica del Horario para un Muestreo Rápido y Robusto de Modelos de Difusión' aborda el lento proceso de muestreo iterativo de los modelos probabilísticos de difusión, conocidos por su fidelidad generativa. Para mejorar la velocidad de muestreo sin comprometer la calidad, los autores presentan el Optimizador de Horario Jerárquico (HSO), un marco de optimización de dos niveles. HSO busca optimizar la distribución de los pasos mientras se adhiere a los principios de efectividad, adaptabilidad, robustez y eficiencia computacional, superando las limitaciones d…

L'article intitulé 'Optimisation Hiérarchique des Horaires pour un Échantillonnage Rapide et Robuste des Modèles de Diffusion' traite du processus d'échantillonnage itératif lent des modèles probabilistes de diffusion, connus pour leur fidélité générative. Pour améliorer la vitesse d'échantillonnage sans compromettre la qualité, les auteurs introduisent l'Optimiseur d'Horaire Hiérarchique (HSO), un cadre d'optimisation à deux niveaux. HSO vise à optimiser la distribution des étapes tout en respectant les principes d'efficacité, d'adaptabilité, de robustesse et d'efficacité computationnelle, su…

The paper titled 'Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling' addresses the slow iterative sampling process of diffusion probabilistic models, which are known for their generative fidelity. To enhance sampling speed without compromising quality, the authors introduce the Hierarchical-Schedule-Optimizer (HSO), a bi-level optimization framework. HSO aims to optimize the distribution of timesteps while adhering to principles of effectiveness, adaptivity, robustness, and computational efficiency, overcoming limitations of existing methods.

Hierarchical Schedule Optimization for Fast and Robust Diffusion Model Sampling

arXiv:2511.12098v1 Announce Type: new 
Abstract: Generating synthetic CT images from CBCT or MRI has a potential for efficient radiation dose planning and adaptive radiotherapy. However, existing CNN-based models lack global semantic understanding, while Transformers often overfit small medical datasets due to high model capacity and weak inductive bias. To address these limitations, we propose a DINOv3-Guided Cross Fusion (DGCF) framework that integrates a frozen self-supervised DINOv3 Transformer with a trainable CNN encoder-decoder. It hierarchically fuses global representation of Transformer and local features of CNN via a learnable cross fusion module, achieving balanced local appearance and contextual representation. Furthermore, we introduce a Multi-Level DINOv3 Perceptual (MLDP) loss that encourages semantic similarity between synthetic CT and the ground truth in DINOv3's feature space. Experiments on the SynthRAD2023 pelvic dataset demonstrate that DGCF achieved state-of-the-art performance in terms of MS-SSIM, PSNR and segmentation-based metrics on both MRI$\rightarrow$CT and CBCT$\rightarrow$CT translation tasks. To the best of our knowledge, this is the first work to employ DINOv3 representations for medical image translation, highlighting the potential of self-supervised Transformer guidance for semantic-aware CT synthesis. The code is available at https://github.com/HiLab-git/DGCF.

X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding

Was this article worth reading? Share it