arXiv:2511.06717v2 Announce Type: replace 
Abstract: Recent advances in extreme image compression have revealed that mapping pixel data into highly compact latent representations can significantly improve coding efficiency. However, most existing methods compress images into 2-D latent spaces via convolutional neural networks (CNNs) or Swin Transformers, which tend to retain substantial spatial redundancy, thereby limiting overall compression performance. In this paper, we propose a novel Mixed RWKV-Transformer (MRT) architecture that encodes images into more compact 1-D latent representations by synergistically integrating the complementary strengths of linear-attention-based RWKV and self-attention-based Transformer models. Specifically, MRT partitions each image into fixed-size windows, utilizing RWKV modules to capture global dependencies across windows and Transformer blocks to model local redundancies within each window. The hierarchical attention mechanism enables more efficient and compact representation learning in the 1-D domain. To further enhance compression efficiency, we introduce a dedicated RWKV Compression Model (RCM) tailored to the structure characteristics of the intermediate 1-D latent features in MRT. Extensive experiments on standard image compression benchmarks validate the effectiveness of our approach. The proposed MRT framework consistently achieves superior reconstruction quality at bitrates below 0.02 bits per pixel (bpp). Quantitative results based on the DISTS metric show that MRT significantly outperforms the state-of-the-art 2-D architecture GLC, achieving bitrate savings of 43.75%, 30.59% on the Kodak and CLIC2020 test datasets, respectively.

أظهرت التطورات الأخيرة في ضغط الصور بشكل متطرف أن تحويل بيانات البكسل إلى تمثيلات كامنة مضغوطة للغاية يمكن أن يحسن بشكل كبير من كفاءة الترميز. ومع ذلك، فإن معظم الطرق الحالية تضغط الصور إلى مساحات كامنة ثنائية الأبعاد عبر الشبكات العصبية التلافيفية أو Transformers Swin، والتي تميل إلى الاحتفاظ بقدر كبير من التكرار المكاني، مما يحد من أداء الضغط العام. في هذه الورقة، نقترح بنية جديدة تسمى Mixed RWKV-Transformer (MRT) التي تشفر الصور إلى تمثيلات كامنة أحادية البعد أكثر إحكامًا.

Los avances recientes en la compresión extrema de imágenes han demostrado que mapear datos de píxeles en representaciones latentes altamente compactas puede mejorar significativamente la eficiencia de codificación. La mayoría de los métodos existentes comprimen imágenes en espacios latentes 2-D a través de redes neuronales convolucionales (CNN) o Transformers Swin, que tienden a retener una redundancia espacial considerable, limitando así el rendimiento general de compresión. Este artículo propone una nueva arquitectura Mixed RWKV-Transformer (MRT) que codifica imágenes en representaciones latentes 1-D más compactas.

Des avancées récentes dans la compression d'images extrêmes ont montré que la conversion des données de pixels en représentations latentes très compactes peut améliorer l'efficacité du codage. Les méthodes traditionnelles reposent souvent sur des réseaux de neurones convolutifs (CNN) ou des Swin Transformers, qui conservent une redondance spatiale significative, limitant ainsi les performances de compression. L'architecture proposée Mixed RWKV-Transformer (MRT) encode les images en représentations latentes 1-D compactes en intégrant les forces des modèles RWKV et Transformer.

Recent advancements in extreme image compression have demonstrated that converting pixel data into highly compact latent representations can enhance coding efficiency. Traditional methods often rely on convolutional neural networks (CNNs) or Swin Transformers, which maintain significant spatial redundancy, limiting compression performance. The proposed Mixed RWKV-Transformer (MRT) architecture encodes images into compact 1-D latent representations by integrating the strengths of RWKV and Transformer models, capturing global dependencies and local redundancies effectively.

MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression

Guardio is leveraging its experience building browser extensions and apps that scan for malicious and phishing sites to build a tool that looks for artifacts in code and websites made with vibe coding tools.

نجحت شركة Guardio الناشئة في مجال الأمن السيبراني في تأمين 80 مليون دولار من التمويل من ION Crossover Partners. تشتهر الشركة بخبرتها في تطوير إضافات المتصفح والتطبيقات التي تكشف عن المواقع الضارة والمحتالة. تخطط Guardio لاستخدام هذا التمويل لإنشاء أداة تبحث عن الآثار في الشيفرة والمواقع التي تم إنشاؤها باستخدام أدوات البرمجة vibe.

La startup de seguridad Guardio ha asegurado 80 millones de dólares en financiamiento de ION Crossover Partners. La empresa es conocida por su experiencia en el desarrollo de extensiones de navegador y aplicaciones que detectan sitios maliciosos y de phishing. Guardio planea utilizar estos fondos para crear una herramienta que identifique artefactos en el código y sitios web construidos con herramientas de codificación vibe.

La startup de sécurité Guardio a obtenu 80 millions de dollars de financement de la part d'ION Crossover Partners. Connue pour son expertise dans le développement d'extensions de navigateur et d'applications détectant les sites malveillants et de phishing, Guardio prévoit d'utiliser ce financement pour créer un outil identifiant les artefacts dans le code et les sites web construits avec des outils de codage vibe.

Security startup Guardio has secured $80 million in funding from ION Crossover Partners. The company is known for its expertise in developing browser extensions and applications that detect malicious and phishing websites. Guardio plans to utilize this funding to create a tool that identifies artifacts in code and websites built with vibe coding tools.

Security startup Guardio nabs $80M from ION Crossover Partners

A new artificial intelligence startup founded by the creators of <a href="https://opencv.org/">the world&#x27;s most widely used computer vision library</a> has emerged from stealth with technology that generates realistic human-centric videos up to five minutes long — a dramatic leap beyond the capabilities of rivals including OpenAI&#x27;s <a href="https://openai.com/sora/">Sora</a> and Google&#x27;s <a href="https://deepmind.google/models/veo/">Veo</a>.<a href="https://craftstory.com/">CraftStory</a>, which launched Tuesday with $2 million in funding, is introducing Model 2.0, a video generation system that addresses one of the most significant limitations plaguing the nascent AI video industry: duration. While OpenAI&#x27;s <a href="https://openai.com/index/sora-2/">Sora 2</a> tops out at 25 seconds and most competing models generate clips of 10 seconds or less, CraftStory&#x27;s system can produce continuous, coherent video performances that run as long as a typical YouTube tutorial or product demonstration.The breakthrough could unlock substantial commercial value for enterprises struggling to scale video production for training, marketing, and customer education — markets where brief AI-generated clips have proven inadequate despite their visual polish.&quot;If you really try to create a video with one of these video generation systems, you find that a lot of the times you want to implement a certain creative vision, and regardless of how detailed the instructions are, the systems basically ignore a part of your instructions,&quot; said Victor Erukhimov, CraftStory&#x27;s founder and CEO, in an exclusive interview with VentureBeat. &quot;We developed a system that can generate videos basically as long as you need them.&quot;<h3>How parallel processing solves the long-form video problem</h3>CraftStory&#x27;s advance rests on what the company describes as a parallelized diffusion architecture — a fundamentally different approach to how AI models generate video compared to the sequential methods employed by most competitors.Traditional video generation models work by running diffusion algorithms on increasingly large three-dimensional volumes where time represents the third axis. To generate a longer video, these models require proportionally larger networks, more training data, and significantly more computational resources.<a href="https://craftstory.com/">CraftStory</a> instead runs multiple smaller diffusion algorithms simultaneously across the entire duration of the video, with bidirectional constraints connecting them. &quot;The latter part of the video can influence the former part of the video too,&quot; Erukhimov explained. &quot;And this is pretty important, because if you do it one by one, then an artifact that appears in the first part propagates to the second one, and then it accumulates.&quot;Rather than generating eight seconds and then stitching on additional segments, CraftStory&#x27;s system processes all five minutes concurrently through interconnected diffusion processes.Crucially, CraftStory trained its model on proprietary footage rather than relying solely on internet-scraped videos. The company hired studios to shoot actors using high-frame-rate camera systems that capture crisp detail even in fast-moving elements like fingers — avoiding the motion blur inherent in standard 30-frames-per-second YouTube clips.&quot;What we showed is that you don&#x27;t need a lot of data and you don&#x27;t need a lot of training budget to create high quality videos,&quot; Erukhimov said. &quot;You just need high quality data.&quot;Model 2.0 currently operates as a video-to-video system: users upload a still image to animate and a &quot;driving video&quot; containing a person whose movements the AI will replicate. CraftStory provides preset driving videos shot with professional actors, who receive revenue shares when their motion data is used, or users can upload their own footage.The system generates 30-second clips at low resolution in approximately 15 minutes. An advanced lip-sync system synchronizes mouth movements to scripts or audio tracks, while gesture alignment algorithms ensure body language matches speech rhythm and emotional tone.<h3>Fighting a war chest battle with $2 million against billions</h3>CraftStory&#x27;s funding comes almost entirely from <a href="https://finance.yahoo.com/news/2-25-billion-exit-taught-130300997.html">Andrew Filev</a>, who sold his project management software company Wrike to Citrix for <a href="https://techcrunch.com/2021/01/19/citrix-is-acquiring-wrike-from-vista-for-2-25b/">$2.25 billion</a> in 2021 and now runs <a href="https://zencoder.ai/">Zencoder</a>, an AI coding company. The modest raise stands in stark contrast to the billions flowing into competing efforts — OpenAI has <a href="https://www.reuters.com/technology/artificial-intelligence/openai-closes-66-billion-funding-haul-valuation-157-billion-with-investment-2024-10-02/">raised over $6 billion</a> in its latest funding round alone.Erukhimov pushed back on the notion that massive capital is prerequisite for success. &quot;I don&#x27;t necessarily buy the thesis that compute is the path to success,&quot; he said. &quot;It definitely helps if you have compute. But if you raise a billion dollars on a PowerPoint, in the end, no one is happy, neither the founders nor the investors.&quot;Filev defended the David-versus-Goliath approach. &quot;When you invest in startups, you&#x27;re fundamentally betting on people,&quot; he said in an interview with VentureBeat. &quot;To paraphrase Margaret Mead: never underestimate what a small group of thoughtful, committed engineers and scientists can build.&quot;He argued that CraftStory benefits from a focused strategy. &quot;The big labs are in an arms race to build general-purpose video foundation models,&quot; Filev said. &quot;CraftStory is riding that wave and going very deep into a specific format: long-form, engaging, human-centric video.&quot;<h3>Why computer vision expertise matters in generative AI video</h3>Erukhimov&#x27;s credibility stems from his deep roots in computer vision rather than the transformer architectures that have dominated recent AI advances. He was an early contributor to <a href="https://opencv.org/">OpenCV</a> — the Open Source Computer Vision Library that has become the de facto standard for computer vision applications, with over <a href="https://github.com/opencv/opencv">84,000 stars on GitHub</a>.When Intel reduced its support for OpenCV in the mid-2000s, Erukhimov co-founded Itseez with the explicit goal of maintaining and advancing the library. The company expanded OpenCV significantly and pivoted toward automotive safety systems before Intel acquired it in 2016.Filev said this background is precisely what makes Erukhimov well-positioned for video generation. &quot;What people sometimes miss is that generative AI video isn&#x27;t just about the generative part. It&#x27;s about understanding motion, facial dynamics, temporal coherence, and how humans actually move,&quot; Filev said. &quot;Victor has spent his career mastering exactly those problems.&quot;<h3>Enterprise focus targets training videos and product demos</h3>While much of the public excitement around AI video generation has centered on creative tools for consumers, CraftStory is pursuing a decidedly enterprise-focused strategy.&quot;We are definitely thinking about B2B more than consumer,&quot; Erukhimov said. &quot;We&#x27;re thinking about companies, specifically software companies, being able to make cool training videos and product videos and launch videos.&quot;The logic is straightforward: corporate training, product tutorials, and customer education videos often run several minutes and require consistent quality throughout. A 10-second AI clip cannot effectively demonstrate how to use enterprise software or explain a complex product feature.&quot;If you need a longer-form video, then you should go with us,&quot; Erukhimov said. &quot;We can create up to five minutes, consistent video, high quality.&quot;Filev echoed this assessment. &quot;One huge gap in this market is the lack of models that can generate consistent videos over longer sequences — and that&#x27;s extremely important for real-world use,&quot; he said. &quot;If you&#x27;re creating a commercial for your company, a 10-second video, no matter how good it looks, just isn&#x27;t enough. You need 30 seconds, you need two minutes — you need more.&quot;The company anticipates cost savings for customers. Filev suggested that &quot;a small business owner could create content in minutes that previously would have cost $20,000 and taken two months to produce.&quot;CraftStory is also courting creative agencies that produce video content for corporate clients, with the value proposition centered on cost and speed: agencies can record an actor on camera and transform that footage into a finished AI video, rather than managing expensive multi-day shoots.The next major development on CraftStory&#x27;s roadmap is a text-to-video model that would allow users to generate long-form content directly from scripts. The team is also developing support for moving-camera scenarios, including the popular &quot;walk-and-talk&quot; format common in high-end advertising.<h3>Where CraftStory fits in a fragmented competitive landscape</h3>CraftStory enters a crowded and rapidly evolving market. OpenAI&#x27;s <a href="https://openai.com/index/sora-2/">Sora 2</a>, while not yet publicly available, has generated significant buzz. Google&#x27;s <a href="https://deepmind.google/models/veo/">Veo models</a> are advancing quickly. <a href="https://runwayml.com/">Runway</a>, <a href="https://pika.art/login">Pika</a>, and <a href="https://stability.ai/">Stability AI </a>all offer video generation tools with different capabilities.Erukhimov acknowledged the competitive pressure but emphasized that CraftStory serves a distinct niche focused on human-centric videos. He positioned rapid innovation and market capture as the company&#x27;s primary strategy rather than relying on technical moats.Filev sees the market fragmenting into distinct layers, with large tech companies serving as &quot;API providers of powerful, general-purpose generation models&quot; while specialized players like CraftStory focus on specific use cases. &quot;If the big players are building the engines, CraftStory is building the production studio and assembly line on top,&quot; he said.Model 2.0 is available now at app.craftstory.com/model-2.0, with the company offering early access to users and enterprises interested in testing the technology. Whether a lightly-funded startup can capture meaningful market share against deep-pocketed incumbents remains uncertain, but Erukhimov is characteristically confident about the opportunity ahead.&quot;AI-generated video will soon become the primary way companies communicate their stories,&quot; he said.

أطلقت CraftStory، وهي شركة ناشئة جديدة في مجال الذكاء الاصطناعي أسسها مبتكرو OpenCV، نظامًا لتوليد الفيديو قادرًا على إنتاج مقاطع فيديو واقعية تركز على الإنسان تصل مدتها إلى خمس دقائق. تتجاوز هذه التكنولوجيا بشكل كبير قدرات المنافسين مثل Sora من OpenAI وVeo من Google، الذين لديهم حدود زمنية أقصر. حصلت الشركة الناشئة على تمويل بقيمة مليوني دولار لدعم نهجها المبتكر في صناعة الفيديو بالذكاء الاصطناعي.

CraftStory, una nueva startup de IA fundada por los creadores de OpenCV, ha lanzado un sistema de generación de video capaz de producir videos realistas centrados en humanos de hasta cinco minutos de duración. Esta tecnología supera significativamente a competidores como Sora de OpenAI y Veo de Google, que tienen límites de duración más cortos. La startup ha asegurado 2 millones de dólares en financiamiento para apoyar su enfoque innovador en la industria de video de IA.

CraftStory, une nouvelle startup d'IA fondée par les créateurs d'OpenCV, a lancé un système de génération vidéo capable de produire des vidéos réalistes centrées sur l'humain d'une durée allant jusqu'à cinq minutes. Cette technologie surpasse considérablement les concurrents tels que Sora d'OpenAI et Veo de Google, qui ont des limites de durée plus courtes. La startup a sécurisé 2 millions de dollars de financement pour soutenir son approche innovante de l'industrie vidéo IA.

CraftStory, a new AI startup founded by the creators of OpenCV, has launched a video generation system capable of producing realistic human-centric videos up to five minutes long. This technology significantly outpaces competitors like OpenAI's Sora and Google's Veo, which have shorter duration limits. The startup has secured $2 million in funding to support its innovative approach to the AI video industry.

OpenCV founders launch AI video startup to take on OpenAI and Google

<a href="https://petapixel.com/2025/11/19/remote-cameras-may-have-captured-first-recorded-tool-use-by-a-wild-wolf/"><img width="1600" height="840" src="https://petapixel.com/assets/uploads/2025/11/wolve-tool-use.jpg" class="attachment-card-large size-card-large wp-post-image" alt="A wolf stands at the edge of a rocky shoreline, holding an orange and white fishing bobber in its mouth. A fishing net and additional gear are lying on the ground nearby. Rippling water is in the background." decoding="async" fetchpriority="high" /></a>Remote cameras have captured footage of wild wolves pulling crab traps out of the sea by their lines to eat the bait inside -- in the first evidence of possible tool use by the canines.
[<a href="https://petapixel.com/2025/11/19/remote-cameras-may-have-captured-first-recorded-tool-use-by-a-wild-wolf/">Read More</a>]

التقطت الكاميرات عن بُعد لقطات لذئاب برية تستخدم تقنية تتضمن سحب فخاخ السلطعون من البحر للوصول إلى الطعم داخلها. وهذا يمثل أول دليل موثق على استخدام محتمل للأدوات من قبل هذه الكلاب.

Cámaras remotas han grabado a lobos salvajes utilizando una técnica que consiste en sacar trampas de cangrejos del mar para acceder al cebo en su interior. Esto marca la primera evidencia documentada de un posible uso de herramientas por parte de estos caninos.

Des caméras à distance ont enregistré des loups sauvages utilisant une technique qui consiste à tirer des pièges à crabes de la mer pour accéder à l'appât à l'intérieur. Cela marque la première preuve documentée d'un potentiel usage d'outils par ces canidés.

Remote cameras have recorded wild wolves using a technique that involves pulling crab traps from the sea to access the bait inside. This marks the first documented evidence of potential tool use by these canines.

Remote Cameras May Have Captured First Recorded Tool Use by a Wild Wolf

The AI startup Firebird Inc. has received US government approval to export Nvidia Corp. chips to Armenia for a supercomputer project in the country, part of a global push to expand artificial intelligence infrastructure.

حصلت شركة Firebird Inc. الناشئة في مجال الذكاء الاصطناعي على موافقة الحكومة الأمريكية لتصدير شرائح Nvidia Corp. إلى أرمينيا، مما يسهل إنشاء مشروع حاسوب فائق في البلاد. تأتي هذه المبادرة كجزء من جهد عالمي أوسع لتعزيز بنية الذكاء الاصطناعي التحتية.

La startup de IA Firebird Inc. ha recibido la aprobación del gobierno de EE. UU. para exportar chips de Nvidia Corp. a Armenia, facilitando el establecimiento de un proyecto de supercomputadora en el país. Esta iniciativa forma parte de un esfuerzo global más amplio para mejorar la infraestructura de inteligencia artificial.

La startup d'IA Firebird Inc. a obtenu l'approbation du gouvernement américain pour exporter des puces de Nvidia Corp. en Arménie, facilitant ainsi l'établissement d'un projet de superordinateur dans le pays. Cette initiative s'inscrit dans un effort mondial plus large pour améliorer l'infrastructure de l'intelligence artificielle.

AI startup Firebird Inc. has obtained approval from the US government to export Nvidia Corp. chips to Armenia, facilitating the establishment of a supercomputer project in the country. This initiative is part of a broader global effort to enhance artificial intelligence infrastructure.

AI Startup Firebird Gets US Approval to Use Nvidia Chips in Armenian Data Center

Gesture of goodwill temporarily cools EU–China dispute that rattled global car supply chains.
The post <a href="https://www.techrepublic.com/article/news-netherlands-nexperia-chipmaker-control/">Netherlands Pauses Move to Seize Chinese-Owned Chipmaker Nexperia</a> appeared first on <a href="https://www.techrepublic.com">TechRepublic</a>.

أوقفت هولندا مؤقتًا جهودها للاستيلاء على شركة Nexperia لصناعة الرقائق المملوكة للصين، وهو إجراء يُعتبر بمثابة لفتة حسن نية تهدف إلى تهدئة التوترات في النزاع القائم بين الاتحاد الأوروبي والصين. تأتي هذه الخطوة وسط مخاوف بشأن تأثيرها على سلاسل الإمداد العالمية التي تأثرت بالفعل بنقص أشباه الموصلات.

Los Países Bajos han detenido temporalmente sus esfuerzos por apoderarse del fabricante de chips Nexperia, de propiedad china, un gesto que se considera como una buena voluntad para aliviar las tensiones en la disputa entre la UE y China. Esta decisión se produce en medio de preocupaciones sobre el impacto en las cadenas de suministro globales, que ya se han visto afectadas por la escasez de semiconductores.

Les Pays-Bas ont temporairement suspendu leurs efforts pour saisir le fabricant de puces Nexperia, détenu par des Chinois, un geste perçu comme une volonté d'apaiser les tensions dans le conflit en cours entre l'UE et la Chine. Cette décision intervient dans un contexte d'inquiétudes concernant l'impact sur les chaînes d'approvisionnement mondiales, déjà affectées par des pénuries de semi-conducteurs.

The Netherlands has temporarily halted its efforts to seize control of the Chinese-owned chipmaker Nexperia, a move seen as a gesture of goodwill aimed at easing tensions in the ongoing EU-China dispute. This decision comes amid concerns over the impact on global car supply chains, which have been affected by semiconductor shortages.

Netherlands Pauses Move to Seize Chinese-Owned Chipmaker Nexperia

استقال لاري سامرز من مجلس إدارة OpenAI، كما أفاد نيويورك تايمز. تأتي استقالته بعد التدقيق في اتصالاته السابقة مع المدان بجريمة الاعتداء الجنسي جيفري إبستين. تمثل هذه الخطوة انسحابًا كبيرًا لسامرز من الأدوار العامة وسط انتقادات متزايدة.

Larry Summers ha renunciado a la junta de OpenAI, según informa The New York Times. Su renuncia se produce tras el escrutinio sobre sus comunicaciones pasadas con el delincuente sexual condenado Jeffrey Epstein. Esta decisión marca un paso significativo en el retiro de Summers de los roles públicos en medio de una creciente crítica.

Larry Summers a démissionné du conseil d'administration d'OpenAI, selon le New York Times. Sa démission fait suite à un examen minutieux de ses communications passées avec le délinquant sexuel condamné Jeffrey Epstein. Cette décision marque un retrait significatif de Summers de ses rôles publics face à une critique croissante.

Larry Summers has resigned from the board of OpenAI, as reported by The New York Times. His resignation follows scrutiny over his past communications with convicted sex offender Jeffrey Epstein. This decision marks a significant step in Summers' withdrawal from public roles amid growing criticism.

Larry Summers Resigns From OpenAI’s Board

arXiv:2511.11093v1 Announce Type: new 
Abstract: Coronary artery calcification (CAC) is a strong predictor of cardiovascular events, with CT-based Agatston scoring widely regarded as the clinical gold standard. However, CT is costly and impractical for large-scale screening, while chest X-rays (CXRs) are inexpensive but lack reliable ground truth labels, constraining deep learning development. Digitally reconstructed radiographs (DRRs) offer a scalable alternative by projecting CT volumes into CXR-like images while inheriting precise labels. In this work, we provide the first systematic evaluation of DRRs as a surrogate training domain for CAC detection. Using 667 CT scans from the COCA dataset, we generate synthetic DRRs and assess model capacity, super-resolution fidelity enhancement, preprocessing, and training strategies. Lightweight CNNs trained from scratch outperform large pretrained networks; pairing super-resolution with contrast enhancement yields significant gains; and curriculum learning stabilises training under weak supervision. Our best configuration achieves a mean AUC of 0.754, comparable to or exceeding prior CXR-based studies. These results establish DRRs as a scalable, label-rich foundation for CAC detection, while laying the foundation for future transfer learning and domain adaptation to real CXRs.

تستكشف دراسة حديثة نُشرت على arXiv استخدام الأشعة السينية الصدرية الاصطناعية للكشف عن ترسبات الكالسيوم في الشرايين التاجية (CAC)، وهو مؤشر قوي للأحداث القلبية الوعائية. تبرز الدراسة قيود نظام تسجيل Agatston القائم على التصوير المقطعي بسبب تكلفته العالية وعدم ملاءمته للفحص على نطاق واسع. من خلال استخدام الأشعة السينية المعاد بناؤها رقميًا (DRRs) المولدة من مسحات التصوير المقطعي، تُظهر الدراسة أن الشبكات العصبية التلافيفية الخفيفة (CNNs) يمكن أن تحدد CAC بفعالية، محققة AUC متوسطًا قدره 0.754.

Un estudio reciente publicado en arXiv investiga el uso de radiografías torácicas sintéticas para la detección de la calcificación de las arterias coronarias (CAC), un predictor significativo de eventos cardiovasculares. La investigación destaca las limitaciones del puntaje Agatston basado en tomografía computarizada debido a su alto costo e impracticidad para el cribado a gran escala. Al utilizar radiografías reconstruidas digitalmente (DRRs) generadas a partir de escaneos CT, el estudio demuestra que las redes neuronales convolucionales ligeras (CNN) pueden identificar eficazmente la CAC, lo…

Une étude récente publiée sur arXiv examine l'utilisation de radiographies thoraciques synthétiques pour la détection de la calcification des artères coronaires (CAC), un indicateur significatif des événements cardiovasculaires. La recherche met en évidence les limites du scoring Agatston basé sur la tomodensitométrie en raison de son coût élevé et de son impraticabilité pour le dépistage à grande échelle. En utilisant des radiographies reconstruites numériquement (DRRs) générées à partir de scans CT, l'étude démontre que des réseaux de neurones convolutifs légers (CNN) peuvent identifier effi…

A recent study published on arXiv explores the use of synthetic chest X-rays for the detection of coronary artery calcification (CAC), a significant predictor of cardiovascular events. The research highlights the limitations of traditional CT-based Agatston scoring due to its high cost and impracticality for large-scale screening. By utilizing digitally reconstructed radiographs (DRRs) generated from CT scans, the study demonstrates that lightweight convolutional neural networks (CNNs) can effectively identify CAC, achieving a mean AUC of 0.754.

Machine-Learning Based Detection of Coronary Artery Calcification Using Synthetic Chest X-Rays

arXiv:2509.02451v2 Announce Type: replace 
Abstract: Surface water dynamics play a critical role in Earth's climate system, influencing ecosystems, agriculture, disaster resilience, and sustainable development. Yet monitoring rivers and surface water at fine spatial and temporal scales remains challenging -- especially for narrow or sediment-rich rivers that are poorly captured by low-resolution satellite data. To address this, we introduce RiverScope, a high-resolution dataset developed through collaboration between computer science and hydrology experts. RiverScope comprises 1,145 high-resolution images (covering 2,577 square kilometers) with expert-labeled river and surface water masks, requiring over 100 hours of manual annotation. Each image is co-registered with Sentinel-2, SWOT, and the SWOT River Database (SWORD), enabling the evaluation of cost-accuracy trade-offs across sensors -- a key consideration for operational water monitoring. We also establish the first global, high-resolution benchmark for river width estimation, achieving a median error of 7.2 meters -- significantly outperforming existing satellite-derived methods. We extensively evaluate deep networks across multiple architectures (e.g., CNNs and transformers), pretraining strategies (e.g., supervised and self-supervised), and training datasets (e.g., ImageNet and satellite imagery). Our best-performing models combine the benefits of transfer learning with the use of all the multispectral PlanetScope channels via learned adaptors. RiverScope provides a valuable resource for fine-scale and multi-sensor hydrological modeling, supporting climate adaptation and sustainable water management.

تم تقديم RiverScope كأحد مجموعات البيانات عالية الدقة التي تهدف إلى تحسين مراقبة الأنهار وديناميات المياه السطحية، والتي تلعب دورًا حاسمًا في فهم نظام المناخ على الأرض. تتضمن المجموعة 1,145 صورة عالية الدقة تغطي 2,577 كيلومترًا مربعًا، مع أقنعة للأنهار والمياه السطحية تم وضع علامات عليها بواسطة خبراء. تهدف هذه المبادرة إلى معالجة التحديات المتعلقة بمراقبة الأنهار الضيقة أو الغنية بالرسوبيات، والتي غالبًا ما يتم تمثيلها بشكل غير كافٍ في بيانات الأقمار الصناعية ذات الدقة المنخفضة.

RiverScope es un nuevo conjunto de datos de alta resolución diseñado para mejorar el monitoreo de ríos y la dinámica de las aguas superficiales, que son cruciales para entender el sistema climático de la Tierra. El conjunto incluye 1,145 imágenes de alta resolución que cubren 2,577 kilómetros cuadrados, con máscaras de ríos y aguas superficiales etiquetadas por expertos. Esta iniciativa aborda los desafíos de monitorear ríos estrechos o ricos en sedimentos que a menudo están inadecuadamente representados en datos satelitales de baja resolución.

RiverScope est un nouvel ensemble de données haute résolution conçu pour améliorer le suivi des rivières et des dynamiques des eaux de surface, essentielles à la compréhension du système climatique terrestre. Cet ensemble comprend 1 145 images haute résolution couvrant 2 577 kilomètres carrés, avec des masques de rivières et d'eaux de surface annotés par des experts. Cette initiative vise à relever les défis du suivi des rivières étroites ou riches en sédiments, souvent mal représentées dans les données satellitaires à basse résolution.

RiverScope is a newly developed high-resolution dataset aimed at improving the monitoring of rivers and surface water dynamics, which are crucial for understanding Earth's climate system. The dataset includes 1,145 high-resolution images covering 2,577 square kilometers, with expert-labeled river and surface water masks. This initiative addresses the challenges of monitoring narrow or sediment-rich rivers that are often inadequately represented in low-resolution satellite data.

RiverScope: High-Resolution River Masking Dataset

arXiv:2511.14109v1 Announce Type: new 
Abstract: Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Optimal transport-based aggregation methods reformulate feature-to-cluster assignment as a transport problem, but the standard Sinkhorn algorithm symmetrically treats source and target marginals, limiting effectiveness when image features and cluster centers exhibit substantially different distributions. We propose an asymmetric aggregation VPR method with geometric constraints for locally aggregated descriptors, called $A^2$GC-VPR. Our method employs row-column normalization averaging with separate marginal calibration, enabling asymmetric matching that adapts to distributional discrepancies in visual place recognition. Geometric constraints are incorporated through learnable coordinate embeddings, computing compatibility scores fused with feature similarities, thereby promoting spatially proximal features to the same cluster and enhancing spatial awareness. Experimental results on MSLS, NordLand, and Pittsburgh datasets demonstrate superior performance, validating the effectiveness of our approach in improving matching accuracy and robustness.

$A^2$GC-VPR هو أسلوب جديد للتعرف على الأماكن البصرية (VPR) يتناول قيود أساليب التجميع التقليدية في مطابقة صور الاستعلام مع قاعدة بيانات. من خلال اعتماد نهج تجميع غير متماثل مع قيود هندسية، يعزز هذا الأسلوب فعالية مطابقة الميزات، خاصة عند التعامل مع توزيعات متباينة لميزات الصورة ومراكز التجمع. تستخدم التقنية متوسطات تطبيع الصفوف والأعمدة مع تضمينات إحداثيات قابلة للتعلم لتحسين درجات التوافق لوصفيات التجميع المحلي.

$A^2$GC-VPR es un nuevo método para el Reconocimiento Visual de Lugares (VPR) que aborda las limitaciones de los métodos de agregación tradicionales al emparejar imágenes de consulta con una base de datos. Al emplear un enfoque de agregación asimétrica con restricciones geométricas, este método mejora la efectividad del emparejamiento de características, especialmente cuando se enfrentan a distribuciones variables de características de imagen y centros de clúster. La técnica utiliza promedios de normalización fila-columna y embeddings de coordenadas aprendibles para mejorar las puntuaciones de…

$A^2$GC-VPR est une nouvelle méthode pour la reconnaissance de lieux visuels (VPR) qui s'attaque aux limites des méthodes d'agrégation traditionnelles dans l'appariement d'images de requête à une base de données. En adoptant une approche d'agrégation asymétrique avec des contraintes géométriques, cette méthode améliore l'efficacité de l'appariement des caractéristiques, en particulier lorsqu'il s'agit de distributions variées des caractéristiques d'image et des centres de clusters. La technique utilise une moyenne de normalisation ligne-colonne et des embeddings de coordonnées apprenables pour…

$A^2$GC-VPR is a new method for Visual Place Recognition (VPR) that addresses the limitations of traditional aggregation methods in matching query images to a database. By employing an asymmetric aggregation approach with geometric constraints, this method enhances the effectiveness of feature matching, particularly when dealing with varying distributions of image features and cluster centers. The technique utilizes row-column normalization averaging and learnable coordinate embeddings to improve compatibility scores for locally aggregated descriptors.

$A^2$GC: $A$symmetric $A$ggregation with Geometric Constraints for Locally Aggregated Descriptors

arXiv:2511.14247v1 Announce Type: new 
Abstract: Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-denied environments, making consistent feature alignment difficult in collaboration. To tackle this challenge, we propose a robust GNSS-free collaborative perception framework based on LiDAR localization. Specifically, we propose a lightweight Pose Generator with Confidence (PGC) to estimate compact pose and confidence representations. To alleviate the effects of localization errors, we further develop the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT), which performs confidence-aware spatial alignment while capturing essential temporal context. Additionally, we present a new simulation dataset, V2VLoc, which can be adapted for both LiDAR localization and collaborative detection tasks. V2VLoc comprises three subsets: Town1Loc, Town4Loc, and V2VDet. Town1Loc and Town4Loc offer multi-traversal sequences for training in localization tasks, whereas V2VDet is specifically intended for the collaborative detection task. Extensive experiments conducted on the V2VLoc dataset demonstrate that our approach achieves state-of-the-art performance under GNSS-denied conditions. We further conduct extended experiments on the real-world V2V4Real dataset to validate the effectiveness and generalizability of PASTAT.

يقدم المقال إطارًا جديدًا للإدراك التعاوني بدون GNSS باستخدام تحديد المواقع بواسطة LiDAR، حيث يتناول التحديات التي تواجهها البيئات التي تفتقر إلى GNSS. غالبًا ما تواجه طرق تحديد المواقع التقليدية صعوبات في هذه البيئات، مما يعيق التعاون الفعال بين أنظمة الوكلاء المتعددة. تتضمن الحلول المقترحة مولد وضع خفيف الوزن مع ثقة (PGC) لتقدير الأوضاع وتمثيلات الثقة، بالإضافة إلى محول التوافق الزماني المكاني الواعي بالوضع (PASTAT) الذي يقوم بأداء التوافق المكاني مع مراعاة الثقة. كما تم تقديم مجموعة بيانات محاكاة جديدة، V2VLoc، التي يمكن تكييفها لمهام تحديد المواقع بواسطة LiDAR والاكتشاف التعاوني.

El artículo presenta un nuevo marco para la percepción colaborativa sin GNSS utilizando la localización por LiDAR, abordando los desafíos que se enfrentan en entornos sin GNSS. Los métodos de localización tradicionales a menudo tienen dificultades en estos entornos, lo que dificulta la colaboración efectiva entre sistemas multiagente. La solución propuesta incluye un Generador de Pose con Confianza (PGC) para estimar poses y confianza, junto con el Transformador de Alineación Espacio-Temporal Consciente de la Pose (PASTAT) para el alineamiento espacial. Se introduce un nuevo conjunto de datos …

L'article présente un nouveau cadre pour la perception collaborative sans GNSS utilisant la localisation par LiDAR, abordant les défis rencontrés dans les environnements privés de GNSS. Les méthodes de localisation traditionnelles peinent souvent dans ces contextes, entravant la collaboration efficace entre systèmes multi-agents. La solution proposée comprend un générateur de pose léger avec confiance (PGC) pour estimer les poses et la confiance, ainsi qu'un transformateur d'alignement spatio-temporel conscient de la pose (PASTAT) pour l'alignement spatial. Un nouveau jeu de données de simulat…

The article presents a new framework for GNSS-free collaborative perception using LiDAR localization, addressing the challenges faced in GNSS-denied environments. Traditional localization methods often struggle in these settings, hindering effective collaboration among multi-agent systems. The proposed solution includes a lightweight Pose Generator with Confidence (PGC) for estimating poses and confidence, alongside the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT) for spatial alignment. A new simulation dataset, V2VLoc, is introduced, which supports LiDAR localization and collabor…

V2VLoc: Robust GNSS-Free Collaborative Perception via LiDAR Localization

arXiv:2511.14210v1 Announce Type: cross 
Abstract: We introduce Orion, a visual agent framework that can take in any modality and generate any modality. Using an agentic framework with multiple tool-calling capabilities, Orion is designed for visual AI tasks and achieves state-of-the-art results. Unlike traditional vision-language models that produce descriptive outputs, Orion orchestrates a suite of specialized computer vision tools, including object detection, keypoint localization, panoptic segmentation, Optical Character Recognition, and geometric analysis, to execute complex multi-step visual workflows. The system achieves competitive performance on MMMU, MMBench, DocVQA, and MMLongBench while extending monolithic vision-language models to production-grade visual intelligence. By combining neural perception with symbolic execution, Orion enables autonomous visual reasoning, marking a transition from passive visual understanding to active, tool-driven visual intelligence.

أورايون هو إطار جديد لوكيل بصري قادر على معالجة وتوليد أنماط متعددة. يستخدم إطارًا وكيلًا مع قدرات متعددة لاستدعاء الأدوات، محققًا نتائج رائدة في مهام الذكاء الاصطناعي البصري. على عكس نماذج الرؤية-اللغة التقليدية، يستخدم أورايون أدوات رؤية حاسوبية متخصصة لتنفيذ سير عمل بصري معقد، محققًا أداءً تنافسيًا في معايير مثل MMMU وMMBench وDocVQA وMMLongBench. يمثل هذا النظام تحولًا نحو الاستدلال البصري المستقل، مما يعزز الذكاء البصري.

Orion es un nuevo marco de agente visual capaz de procesar y generar diversas modalidades. Utiliza un marco agentivo con múltiples capacidades de llamada a herramientas, logrando resultados de vanguardia en tareas de IA visual. A diferencia de los modelos tradicionales de visión-lenguaje, Orion emplea herramientas especializadas de visión por computadora para flujos de trabajo visuales complejos, alcanzando un rendimiento competitivo en benchmarks como MMMU, MMBench, DocVQA y MMLongBench. Este sistema marca una transición hacia el razonamiento visual autónomo, mejorando la inteligencia visual.

Orion est un nouveau cadre d'agent visuel capable de traiter et de générer diverses modalités. Il utilise un cadre agentique avec plusieurs capacités d'appel d'outils, atteignant des résultats de pointe dans les tâches d'IA visuelle. Contrairement aux modèles traditionnels de vision-langage, Orion utilise des outils de vision par ordinateur spécialisés pour des flux de travail visuels complexes, obtenant des performances compétitives sur des benchmarks tels que MMMU, MMBench, DocVQA et MMLongBench. Ce système marque un tournant vers le raisonnement visuel autonome, améliorant l'intelligence vi…

Orion is a newly introduced visual agent framework capable of processing and generating various modalities. It employs an agentic framework with multiple tool-calling capabilities, achieving state-of-the-art results in visual AI tasks. Unlike traditional vision-language models, Orion utilizes specialized computer vision tools for complex visual workflows, achieving competitive performance on benchmarks like MMMU, MMBench, DocVQA, and MMLongBench. This system marks a shift towards autonomous visual reasoning, enhancing visual intelligence.

MRT: Learning Compact Representations with Mixed RWKV-Transformer for Extreme Image Compression

Was this article worth reading? Share it