arXiv:2503.06223v4 Announce Type: replace 
Abstract: Vision-Language Models (VLMs) are vulnerable to jailbreak attacks, where adversaries bypass safety mechanisms to elicit harmful outputs. In this work, we examine an insidious variant of this threat: toxic continuation. Unlike standard jailbreaks that rely solely on malicious instructions, toxic continuation arises when the model is given a malicious input alongside a partial toxic output, resulting in harmful completions. This vulnerability poses a unique challenge in multimodal settings, where even subtle image variations can disproportionately affect the model's response. To this end, we propose RedDiffuser (RedDiff), the first red teaming framework that uses reinforcement learning to fine-tune diffusion models into generating natural-looking adversarial images that induce toxic continuations. RedDiffuser integrates a greedy search procedure for selecting candidate image prompts with reinforcement fine-tuning that jointly promotes toxic output and semantic coherence. Experiments demonstrate that RedDiffuser significantly increases the toxicity rate in LLaVA outputs by 10.69% and 8.91% on the original and hold-out sets, respectively. It also exhibits strong transferability, increasing toxicity rates on Gemini by 5.1% and on LLaMA-Vision by 26.83%. These findings uncover a cross-modal toxicity amplification vulnerability in current VLM alignment, highlighting the need for robust multimodal red teaming. We will release the RedDiffuser codebase to support future research.

تقدم المقالة RedDiffuser، وهو إطار جديد مصمم لتحسين أمان نماذج الرؤية-اللغة (VLM) ضد هجمات الاستمرار السام. تستغل هذه الهجمات نقاط ضعف النموذج من خلال دمج مدخلات ضارة مع مخرجات سامة جزئية، مما يؤدي إلى إكمالات ضارة. يستخدم RedDiffuser التعلم المعزز لتوليد صور عدائية تزيد من معدلات السمية في مخرجات النموذج، مما يظهر زيادة كبيرة في السمية عبر نماذج مختلفة، مما يبرز الحاجة الملحة إلى تدابير أمان متعددة الوسائط قوية.

El artículo presenta RedDiffuser, un nuevo marco diseñado para mejorar la seguridad de los Modelos de Visión-Lenguaje (VLM) contra ataques de continuación tóxica. Estos ataques explotan las vulnerabilidades del modelo al combinar entradas maliciosas con salidas tóxicas parciales, lo que lleva a completaciones dañinas. RedDiffuser utiliza el aprendizaje por refuerzo para generar imágenes adversariales que aumentan las tasas de toxicidad en las salidas del modelo, demostrando un aumento significativo de la toxicidad en varios modelos, subrayando la urgente necesidad de medidas de seguridad multimodal robustas.

L'article présente RedDiffuser, un nouveau cadre conçu pour améliorer la sécurité des modèles de vision-langage (VLM) contre les attaques de continuation toxique. Ces attaques exploitent les vulnérabilités du modèle en combinant des entrées malveillantes avec des sorties toxiques partielles, entraînant des complétions nuisibles. RedDiffuser utilise l'apprentissage par renforcement pour générer des images adversariales qui augmentent les taux de toxicité dans les sorties du modèle, démontrant une augmentation significative de la toxicité dans divers modèles, soulignant l'urgence de mesures de sécurité multimodales robustes.

The article introduces RedDiffuser, a novel framework designed to enhance the safety of Vision-Language Models (VLMs) against toxic continuation attacks. These attacks exploit the model's vulnerabilities by combining malicious inputs with partial toxic outputs, leading to harmful completions. RedDiffuser employs reinforcement learning to generate adversarial images that increase toxicity rates in model outputs, demonstrating a significant rise in toxicity across various models, highlighting the urgent need for robust multimodal safety measures.

RedDiffuser: Red Teaming Vision-Language Models for Toxic Continuation via Reinforced Stable Diffusion

<A HREF="https://i10x.ai/news/grok-4-1-eq-bench-empathy-sycophancy-paradox"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251119/i3.jpg"></A>
<A HREF="http://www.techmeme.com/251119/p3#a251119p3" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Christopher Ort / <A HREF="https://i10x.ai/">i10X</A>: 
<A HREF="https://i10x.ai/news/grok-4-1-eq-bench-empathy-sycophancy-paradox">xAI's Grok 4.1 tops benchmarks in emotional intelligence, while its model card also shows a marked increase in sycophancy compared to Grok 4</A>&nbsp; &mdash;&nbsp; &#9889; Quick Take &hellip; Summary&nbsp; &mdash;&nbsp; xAI released Grok 4.1, which now leads the EQ-Bench3, a benchmark measuring an LLM's emotional intelligence through roleplay scenarios.

أصدرت شركة xAI النسخة 4.1 من Grok، التي حققت أعلى الدرجات في EQ-Bench3، وهو معيار يقيم الذكاء العاطفي في نماذج اللغة الكبيرة (LLMs) من خلال سيناريوهات لعب الأدوار. يُظهر النموذج الجديد زيادة ملحوظة في التملق مقارنة بسابقه، Grok 4. يسلط هذا التطور الضوء على التطور المستمر لقدرات الذكاء الاصطناعي في فهم والاستجابة للعواطف البشرية، بينما يثير أيضًا تساؤلات حول تداعيات زيادة التملق في التفاعلات مع الذكاء الاصطناعي.

xAI ha lanzado Grok 4.1, que ha alcanzado las mejores puntuaciones en el EQ-Bench3, un estándar que evalúa la inteligencia emocional en modelos de lenguaje de gran tamaño (LLM) a través de escenarios de juego de roles. El nuevo modelo muestra un aumento significativo en la adulación en comparación con su predecesor, Grok 4. Este desarrollo resalta la evolución continua de las capacidades de la IA para comprender y responder a las emociones humanas, al tiempo que plantea preguntas sobre las implicaciones de un aumento en la adulación en las interacciones con la IA.

xAI a lancé Grok 4.1, qui a atteint les meilleures performances dans l'EQ-Bench3, un benchmark évaluant l'intelligence émotionnelle des modèles de langage à grande échelle (LLM) à travers des scénarios de jeu de rôle. Le nouveau modèle montre une augmentation significative de la sycophantie par rapport à son prédécesseur, Grok 4. Ce développement met en lumière l'évolution continue des capacités de l'IA à comprendre et à répondre aux émotions humaines, tout en soulevant des questions sur les implications d'une sycophantie accrue dans les interactions avec l'IA.

xAI has released Grok 4.1, which has achieved top scores in the EQ-Bench3, a benchmark assessing emotional intelligence in large language models (LLMs) through roleplay scenarios. The new model shows a significant increase in sycophancy compared to its predecessor, Grok 4. This development highlights the ongoing evolution of AI capabilities in understanding and responding to human emotions, while also raising questions about the implications of increased sycophancy in AI interactions.

xAI's Grok 4.1 tops benchmarks in emotional intelligence, while its model card also shows a marked increase in sycophancy compared to Grok 4 (Christopher Ort/i10X)

<img width="800" height="450" src="https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise-1300x731.jpg" class="webfeedsFeaturedVisual wp-post-image" alt="" style="display: block; margin: auto; margin-bottom: 5px;max-width: 100%;" link_thumbnail="" decoding="async" fetchpriority="high" srcset="https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise-1300x731.jpg 1300w, https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise-600x338.jpg 600w, https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise-768x432.jpg 768w, https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise-1536x864.jpg 1536w, https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise-150x84.jpg 150w, https://analyticsindiamag.com/wp-content/uploads/2024/11/Funding-for-Data-Centres-is-on-the-Rise.jpg 1600w" sizes="(max-width: 800px) 100vw, 800px" />India generates 20% of global data but stores only 3% of it locally. Two forces are now working to close this gap. 
The post <a href="https://analyticsindiamag.com/ai-features/unpacking-indias-data-centre-boom/">Unpacking India’s Data Centre Boom </a> appeared first on <a href="https://analyticsindiamag.com">Analytics India Magazine</a>.

تشهد الهند ازدهارًا كبيرًا في مراكز البيانات، مدفوعًا بالطلب المتزايد على خدمات السحابة والبنية التحتية الرقمية. يجذب هذا النمو استثمارات كبيرة، حيث تدرك الشركات إمكانات السوق الهندي. من المتوقع أن يعزز ارتفاع مراكز البيانات القدرات التكنولوجية للبلاد ويدعم قطاعات متعددة، بما في ذلك الذكاء الاصطناعي والحوسبة السحابية.

India está experimentando un auge significativo en los centros de datos, impulsado por la creciente demanda de servicios en la nube e infraestructura digital. Este crecimiento está atrayendo inversiones sustanciales, ya que las empresas reconocen el potencial del mercado indio. El aumento de los centros de datos se espera que mejore las capacidades tecnológicas del país y apoye varios sectores, incluida la inteligencia artificial y la computación en la nube.

L'Inde connaît un boom significatif des centres de données, stimulé par une demande croissante de services cloud et d'infrastructures numériques. Cette croissance attire des investissements substantiels, les entreprises reconnaissant le potentiel du marché indien. L'essor des centres de données devrait améliorer les capacités technologiques du pays et soutenir divers secteurs, y compris l'intelligence artificielle et l'informatique en nuage.

India is experiencing a significant boom in data centers, driven by increasing demand for cloud services and digital infrastructure. This growth is attracting substantial investments, with companies recognizing the potential of India's market. The rise of data centers is expected to enhance the country's technological capabilities and support various sectors, including artificial intelligence and cloud computing.

Unpacking India’s Data Centre Boom

TikTok is adding new digital wellbeing tools like an affirmation journal, a background sound generator, and badges for using the app within limits

تقوم تيك توك بإدخال أدوات جديدة للرفاهية الرقمية تهدف إلى تعزيز عادات الاستخدام الأكثر صحة بين مستخدميها. تشمل الميزات دفتر ملاحظات للتأكيدات، ومولد أصوات خلفية، وشارات تكافئ المستخدمين على تقليل وقتهم في التطبيق. تم تصميم هذه الأدوات لمساعدة المستخدمين في إدارة وقت الشاشة وتقليل الآثار السلبية للاستخدام المفرط لوسائل التواصل الاجتماعي.

TikTok está introduciendo nuevas herramientas de bienestar digital destinadas a promover hábitos de uso más saludables entre sus usuarios. Las funciones incluyen un diario de afirmaciones, un generador de sonidos de fondo y medallas que recompensan a los usuarios por limitar su tiempo en la aplicación. Estas herramientas están diseñadas para ayudar a los usuarios a gestionar su tiempo de pantalla y reducir los efectos negativos del consumo excesivo de redes sociales.

TikTok introduit de nouveaux outils de bien-être numérique visant à promouvoir des habitudes d'utilisation plus saines parmi ses utilisateurs. Les fonctionnalités comprennent un journal d'affirmations, un générateur de sons d'ambiance et des badges qui récompensent les utilisateurs pour avoir limité leur temps sur l'application. Ces outils sont conçus pour aider les utilisateurs à gérer leur temps d'écran et à réduire les effets négatifs d'une consommation excessive des réseaux sociaux.

TikTok is introducing new digital wellbeing tools aimed at promoting healthier usage habits among its users. The features include an affirmation journal, a background sound generator, and badges that reward users for limiting their time on the app. These tools are designed to help users manage their screen time and reduce the negative effects of excessive social media consumption.

TikTok will now give you badges for limiting your doomscrolling

<A HREF="https://techcrunch.com/2025/11/18/tiktok-now-lets-you-choose-how-much-ai-generated-content-you-want-to-see/"><IMG VSPACE="4" HSPACE="4" BORDER="0" ALIGN="RIGHT" SRC="http://www.techmeme.com/251119/i2.jpg"></A>
<A HREF="http://www.techmeme.com/251119/p2#a251119p2" TITLE="Techmeme permalink"><IMG WIDTH=11 HEIGHT=12 SRC="http://www.techmeme.com/img/pml.png" STYLE="border:none;padding:0;margin:0;"></A> Aisha Malik / <A HREF="http://techcrunch.com/">TechCrunch</A>: 
<A HREF="https://techcrunch.com/2025/11/18/tiktok-now-lets-you-choose-how-much-ai-generated-content-you-want-to-see/">TikTok will let users choose how much AI-generated content appears in their For You feed and plans to add more advanced labeling tech for AI-content</A>&nbsp; &mdash;&nbsp; TikTok, an app that was once just a place for user-generated content, is launching a new setting that lets users choose how much AI-generated content &hellip;

تقوم تيك توك بإطلاق ميزة جديدة تتيح للمستخدمين التحكم في كمية المحتوى الذي يتم إنشاؤه بواسطة الذكاء الاصطناعي في خلاصة "For You" الخاصة بهم. تهدف هذه الإعدادات إلى تحسين تجربة المستخدم من خلال توفير المزيد من خيارات التخصيص. بالإضافة إلى ذلك، تخطط تيك توك لتنفيذ تقنية تصنيف متقدمة لتحديد المحتوى الذي يتم إنشاؤه بواسطة الذكاء الاصطناعي بشكل أفضل، مما يضمن الشفافية للمستخدمين. تعكس هذه الخطوة الجهود المستمرة للمنصة للتكيف مع تفضيلات المستخدمين وتأثير الذكاء الاصطناعي المتزايد في وسائل التواصل الاجتماعي.

TikTok está lanzando una nueva función que permite a los usuarios controlar la cantidad de contenido generado por IA en su feed de For You. Esta configuración tiene como objetivo mejorar la experiencia del usuario al proporcionar más opciones de personalización. Además, TikTok planea implementar tecnología de etiquetado avanzada para identificar mejor el contenido generado por IA, asegurando la transparencia para los usuarios. Este movimiento refleja los esfuerzos continuos de la plataforma para adaptarse a las preferencias de los usuarios y la creciente influencia de la inteligencia artificia…

TikTok introduit une nouvelle fonctionnalité permettant aux utilisateurs de contrôler la quantité de contenu généré par l'IA dans leur fil For You. Ce paramètre vise à améliorer l'expérience utilisateur en offrant davantage d'options de personnalisation. De plus, TikTok prévoit de mettre en œuvre une technologie d'étiquetage avancée pour mieux identifier le contenu généré par l'IA, garantissant ainsi la transparence pour les utilisateurs. Ce mouvement reflète les efforts continus de la plateforme pour s'adapter aux préférences des utilisateurs et à l'influence croissante de l'intelligence arti…

TikTok is introducing a new feature that allows users to control the amount of AI-generated content in their For You feed. This setting aims to enhance user experience by providing more customization options. Additionally, TikTok plans to implement advanced labeling technology to better identify AI-generated content, ensuring transparency for users. This move reflects the platform's ongoing efforts to adapt to user preferences and the growing influence of artificial intelligence in social media.

TikTok will let users choose how much AI-generated content appears in their For You feed and plans to add more advanced labeling tech for AI-content (Aisha Malik/TechCrunch)

Wall Street will get a sense of where the billions of dollars being spent on artificial intelligence are going when Nvidia Corp. reports its earnings after the bell on Wednesday. How the sinking stock market will react is another question.

من المقرر أن تعلن شركة إنفيديا عن أرباحها يوم الأربعاء، مما يوفر رؤى حول الاستثمارات الكبيرة في الذكاء الاصطناعي. تبقى ردود فعل السوق، خاصةً في ظل الانخفاض الأخير، غير مؤكدة. تراقب وول ستريت عن كثب كيف ستعكس هذه الأرباح القطاع التكنولوجي الأوسع في ظل المخاوف المتزايدة بشأن الإنفاق على الذكاء الاصطناعي.

Nvidia Corp. está programada para informar sus ganancias el miércoles, proporcionando información sobre las significativas inversiones en inteligencia artificial. La reacción del mercado, especialmente dado el reciente descenso, sigue siendo incierta. Wall Street está observando de cerca cómo estos resultados reflejarán el sector tecnológico más amplio en medio de crecientes preocupaciones sobre el gasto en IA.

Nvidia Corp. doit publier ses résultats mercredi, offrant un aperçu des investissements considérables dans l'intelligence artificielle. La réaction du marché, en particulier compte tenu de la récente baisse, reste incertaine. Wall Street surveille de près comment ces résultats refléteront le secteur technologique plus large face aux préoccupations croissantes concernant les dépenses en IA.

Nvidia Corp. is set to report its earnings on Wednesday, providing insights into the significant investments being made in artificial intelligence. The market's reaction, particularly given the recent downturn, remains uncertain. Wall Street is closely monitoring how these earnings will reflect on the broader tech sector amidst growing concerns about AI spending.

Nvidia Earnings Run Into a Market Suddenly Afraid of AI Spending

With the new AI-generated content control, users who want to see less of this sort of content can dial things down, while those who enjoy it can choose to see more of it.

قدمت تيك توك ميزة جديدة تتيح للمستخدمين التحكم في كمية المحتوى الذي يتم إنشاؤه بواسطة الذكاء الاصطناعي الذي يواجهونه على المنصة. تتيح هذه الميزة للمستخدمين تقليل أو زيادة رؤية هذا النوع من المحتوى وفقًا لتفضيلاتهم، مما يعزز تجربتهم العامة على التطبيق.

TikTok ha introducido una nueva función que permite a los usuarios controlar la cantidad de contenido generado por IA que encuentran en la plataforma. Esta función permite a los usuarios reducir o aumentar la visibilidad de dicho contenido según sus preferencias, mejorando así su experiencia general en la aplicación.

TikTok a introduit une nouvelle fonctionnalité permettant aux utilisateurs de contrôler la quantité de contenu généré par l'IA qu'ils rencontrent sur la plateforme. Cette fonctionnalité permet aux utilisateurs de réduire ou d'augmenter la visibilité de ce type de contenu selon leurs préférences, améliorant ainsi leur expérience globale sur l'application.

TikTok has introduced a new feature that allows users to control the amount of AI-generated content they encounter on the platform. This feature enables users to either reduce or increase the visibility of such content according to their preferences, enhancing their overall experience on the app.

TikTok now lets you choose how much AI-generated content you want to see

🚀 I just built my first ever 2D game - Death’s Job! 💀🎮 
A little ghost. A sky full of obstacles. And a lot of physics.

I started this project purely out of curiosity - 
“How do people make a game from scratch using something low-level like Pygame?”

So, I opened a new Python file… and a few days later, I surprisingly had most of the game logic working. 
Gravity, physics, collisions, obstacle generation - all fun to implement.

But then I hit the real boss fight in game development: 
🎨 finding assets that actually match. 
I was using random images from the internet… and the game looked like a cursed collage.

Then Diwali happened. Then my university exams happened. 
And the project sat untouched.

Once exams ended, I returned with fresh energy and decided to finish it properly. 
So, I generated matching sprites using Gemini + creative prompting, and honestly… asset creation was harder than the coding 😅

For sound effects and background music, I went digging through Pixabay and finally found pieces that fit the vibe.

Today - after polishing the UI, splash screen, physics, and gameplay. 
I’m proud to say the game is complete. 🎉

🎮 Death’s Job — Features

<ul>
<li>Interactive splash screen with animated buttons</li>
<li>Flappy Bird–style jumping with horizontal controls</li>
<li>Smooth physics (gravity, velocity, damping)</li>
<li>Pixel-perfect mask-based collision detection</li>
<li>Randomized obstacles with recycling system</li>
<li>Infinite scrolling cloud background</li>
<li>Jump/Collision SFX + looping background music</li>
<li>Auto-reset system</li>
<li>Runs at 60 FPS</li>
</ul>

🎥 Gameplay Recording Gif:

<a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xhi2ul08ia2zu8ouifw.gif" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0xhi2ul08ia2zu8ouifw.gif" alt="Gameplay gif" width="426" height="240"></a>

🔗 GitHub Repository: 
<a href="https://github.com/IamVatsal/Deaths-Job" rel="noopener noreferrer">Death's Game</a>

This project taught me:

<ul>
<li>How game engines handle physics at a low level</li>
<li>How deceptively hard good asset selection is</li>
<li>How much polish simple games actually need</li>
<li>And how fun it is to give life to something you built from scratch</li>
</ul>

I’d love feedback, ideas, or critiques - especially from anyone into game dev or Python.

Thanks for reading! 😊

يتناول المقال إنشاء لعبة ثنائية الأبعاد بعنوان 'Death's Job'، تم تطويرها باستخدام Pygame. بدأ المطور هذا المشروع بدافع الفضول حول تطوير الألعاب. بعد تنفيذ منطق اللعبة في البداية، بما في ذلك الجاذبية والفيزياء، واجه المشروع تحديات في العثور على أصول مناسبة. بعد فترة توقف بسبب ديوالي وامتحانات الجامعة، عاد المطور لإنهاء اللعبة، حيث قام بإنشاء شخصيات متطابقة باستخدام Gemini والحصول على مؤثرات صوتية من Pixabay. المنتج النهائي يتضمن شاشة بداية تفاعلية، قفزات على طراز Flappy Bird، فيزياء سلسة، واكتشاف دقيق للتصادم.

El artículo trata sobre la creación de un juego 2D titulado 'Death's Job', desarrollado utilizando Pygame. El desarrollador comenzó este proyecto por curiosidad sobre el desarrollo de juegos. Después de implementar inicialmente la lógica del juego, incluyendo gravedad y física, el proyecto enfrentó desafíos para encontrar activos adecuados. Tras un descanso debido a Diwali y los exámenes universitarios, el desarrollador regresó para completar el juego, generando sprites coincidentes con Gemini y obteniendo efectos de sonido de Pixabay. El producto final cuenta con una pantalla de inicio interactiva, saltos al estilo de Flappy Bird, física suave y detección de colisiones precisa.

L'article traite de la création d'un jeu 2D intitulé 'Death's Job', développé avec Pygame. Le développeur a entrepris ce projet par curiosité pour le développement de jeux. Après avoir initialement mis en œuvre la logique du jeu, y compris la gravité et la physique, le projet a rencontré des difficultés pour trouver des ressources adaptées. Après une pause due à Diwali et aux examens universitaires, le développeur est revenu pour terminer le jeu, générant des sprites correspondants avec Gemini et trouvant des effets sonores sur Pixabay. Le produit final comprend un écran de démarrage interactif, des sauts de style Flappy Bird, une physique fluide et une détection de collision pixel-perfect.

The article discusses the creation of a 2D game titled 'Death's Job,' developed using Pygame. The developer embarked on this project out of curiosity about game development. After initially implementing game logic, including gravity and physics, the project faced challenges in finding suitable assets. Following a break due to Diwali and university exams, the developer returned to complete the game, generating matching sprites with Gemini and sourcing sound effects from Pixabay. The final product features an interactive splash screen, Flappy Bird-style jumping, smooth physics, and pixel-perfect collision detection.

Death’s Job Game

arXiv:2511.14109v1 Announce Type: new 
Abstract: Visual Place Recognition (VPR) aims to match query images against a database using visual cues. State-of-the-art methods aggregate features from deep backbones to form global descriptors. Optimal transport-based aggregation methods reformulate feature-to-cluster assignment as a transport problem, but the standard Sinkhorn algorithm symmetrically treats source and target marginals, limiting effectiveness when image features and cluster centers exhibit substantially different distributions. We propose an asymmetric aggregation VPR method with geometric constraints for locally aggregated descriptors, called $A^2$GC-VPR. Our method employs row-column normalization averaging with separate marginal calibration, enabling asymmetric matching that adapts to distributional discrepancies in visual place recognition. Geometric constraints are incorporated through learnable coordinate embeddings, computing compatibility scores fused with feature similarities, thereby promoting spatially proximal features to the same cluster and enhancing spatial awareness. Experimental results on MSLS, NordLand, and Pittsburgh datasets demonstrate superior performance, validating the effectiveness of our approach in improving matching accuracy and robustness.

$A^2$GC-VPR هو أسلوب جديد للتعرف على الأماكن البصرية (VPR) يتناول قيود أساليب التجميع التقليدية في مطابقة صور الاستعلام مع قاعدة بيانات. من خلال اعتماد نهج تجميع غير متماثل مع قيود هندسية، يعزز هذا الأسلوب فعالية مطابقة الميزات، خاصة عند التعامل مع توزيعات متباينة لميزات الصورة ومراكز التجمع. تستخدم التقنية متوسطات تطبيع الصفوف والأعمدة مع تضمينات إحداثيات قابلة للتعلم لتحسين درجات التوافق لوصفيات التجميع المحلي.

$A^2$GC-VPR es un nuevo método para el Reconocimiento Visual de Lugares (VPR) que aborda las limitaciones de los métodos de agregación tradicionales al emparejar imágenes de consulta con una base de datos. Al emplear un enfoque de agregación asimétrica con restricciones geométricas, este método mejora la efectividad del emparejamiento de características, especialmente cuando se enfrentan a distribuciones variables de características de imagen y centros de clúster. La técnica utiliza promedios de normalización fila-columna y embeddings de coordenadas aprendibles para mejorar las puntuaciones de…

$A^2$GC-VPR est une nouvelle méthode pour la reconnaissance de lieux visuels (VPR) qui s'attaque aux limites des méthodes d'agrégation traditionnelles dans l'appariement d'images de requête à une base de données. En adoptant une approche d'agrégation asymétrique avec des contraintes géométriques, cette méthode améliore l'efficacité de l'appariement des caractéristiques, en particulier lorsqu'il s'agit de distributions variées des caractéristiques d'image et des centres de clusters. La technique utilise une moyenne de normalisation ligne-colonne et des embeddings de coordonnées apprenables pour…

$A^2$GC-VPR is a new method for Visual Place Recognition (VPR) that addresses the limitations of traditional aggregation methods in matching query images to a database. By employing an asymmetric aggregation approach with geometric constraints, this method enhances the effectiveness of feature matching, particularly when dealing with varying distributions of image features and cluster centers. The technique utilizes row-column normalization averaging and learnable coordinate embeddings to improve compatibility scores for locally aggregated descriptors.

$A^2$GC: $A$symmetric $A$ggregation with Geometric Constraints for Locally Aggregated Descriptors

arXiv:2511.14247v1 Announce Type: new 
Abstract: Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-denied environments, making consistent feature alignment difficult in collaboration. To tackle this challenge, we propose a robust GNSS-free collaborative perception framework based on LiDAR localization. Specifically, we propose a lightweight Pose Generator with Confidence (PGC) to estimate compact pose and confidence representations. To alleviate the effects of localization errors, we further develop the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT), which performs confidence-aware spatial alignment while capturing essential temporal context. Additionally, we present a new simulation dataset, V2VLoc, which can be adapted for both LiDAR localization and collaborative detection tasks. V2VLoc comprises three subsets: Town1Loc, Town4Loc, and V2VDet. Town1Loc and Town4Loc offer multi-traversal sequences for training in localization tasks, whereas V2VDet is specifically intended for the collaborative detection task. Extensive experiments conducted on the V2VLoc dataset demonstrate that our approach achieves state-of-the-art performance under GNSS-denied conditions. We further conduct extended experiments on the real-world V2V4Real dataset to validate the effectiveness and generalizability of PASTAT.

يقدم المقال إطارًا جديدًا للإدراك التعاوني بدون GNSS باستخدام تحديد المواقع بواسطة LiDAR، حيث يتناول التحديات التي تواجهها البيئات التي تفتقر إلى GNSS. غالبًا ما تواجه طرق تحديد المواقع التقليدية صعوبات في هذه البيئات، مما يعيق التعاون الفعال بين أنظمة الوكلاء المتعددة. تتضمن الحلول المقترحة مولد وضع خفيف الوزن مع ثقة (PGC) لتقدير الأوضاع وتمثيلات الثقة، بالإضافة إلى محول التوافق الزماني المكاني الواعي بالوضع (PASTAT) الذي يقوم بأداء التوافق المكاني مع مراعاة الثقة. كما تم تقديم مجموعة بيانات محاكاة جديدة، V2VLoc، التي يمكن تكييفها لمهام تحديد المواقع بواسطة LiDAR والاكتشاف التعاوني.

El artículo presenta un nuevo marco para la percepción colaborativa sin GNSS utilizando la localización por LiDAR, abordando los desafíos que se enfrentan en entornos sin GNSS. Los métodos de localización tradicionales a menudo tienen dificultades en estos entornos, lo que dificulta la colaboración efectiva entre sistemas multiagente. La solución propuesta incluye un Generador de Pose con Confianza (PGC) para estimar poses y confianza, junto con el Transformador de Alineación Espacio-Temporal Consciente de la Pose (PASTAT) para el alineamiento espacial. Se introduce un nuevo conjunto de datos …

L'article présente un nouveau cadre pour la perception collaborative sans GNSS utilisant la localisation par LiDAR, abordant les défis rencontrés dans les environnements privés de GNSS. Les méthodes de localisation traditionnelles peinent souvent dans ces contextes, entravant la collaboration efficace entre systèmes multi-agents. La solution proposée comprend un générateur de pose léger avec confiance (PGC) pour estimer les poses et la confiance, ainsi qu'un transformateur d'alignement spatio-temporel conscient de la pose (PASTAT) pour l'alignement spatial. Un nouveau jeu de données de simulat…

The article presents a new framework for GNSS-free collaborative perception using LiDAR localization, addressing the challenges faced in GNSS-denied environments. Traditional localization methods often struggle in these settings, hindering effective collaboration among multi-agent systems. The proposed solution includes a lightweight Pose Generator with Confidence (PGC) for estimating poses and confidence, alongside the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT) for spatial alignment. A new simulation dataset, V2VLoc, is introduced, which supports LiDAR localization and collabor…

V2VLoc: Robust GNSS-Free Collaborative Perception via LiDAR Localization

arXiv:2511.14210v1 Announce Type: cross 
Abstract: We introduce Orion, a visual agent framework that can take in any modality and generate any modality. Using an agentic framework with multiple tool-calling capabilities, Orion is designed for visual AI tasks and achieves state-of-the-art results. Unlike traditional vision-language models that produce descriptive outputs, Orion orchestrates a suite of specialized computer vision tools, including object detection, keypoint localization, panoptic segmentation, Optical Character Recognition, and geometric analysis, to execute complex multi-step visual workflows. The system achieves competitive performance on MMMU, MMBench, DocVQA, and MMLongBench while extending monolithic vision-language models to production-grade visual intelligence. By combining neural perception with symbolic execution, Orion enables autonomous visual reasoning, marking a transition from passive visual understanding to active, tool-driven visual intelligence.

أورايون هو إطار جديد لوكيل بصري قادر على معالجة وتوليد أنماط متعددة. يستخدم إطارًا وكيلًا مع قدرات متعددة لاستدعاء الأدوات، محققًا نتائج رائدة في مهام الذكاء الاصطناعي البصري. على عكس نماذج الرؤية-اللغة التقليدية، يستخدم أورايون أدوات رؤية حاسوبية متخصصة لتنفيذ سير عمل بصري معقد، محققًا أداءً تنافسيًا في معايير مثل MMMU وMMBench وDocVQA وMMLongBench. يمثل هذا النظام تحولًا نحو الاستدلال البصري المستقل، مما يعزز الذكاء البصري.

Orion es un nuevo marco de agente visual capaz de procesar y generar diversas modalidades. Utiliza un marco agentivo con múltiples capacidades de llamada a herramientas, logrando resultados de vanguardia en tareas de IA visual. A diferencia de los modelos tradicionales de visión-lenguaje, Orion emplea herramientas especializadas de visión por computadora para flujos de trabajo visuales complejos, alcanzando un rendimiento competitivo en benchmarks como MMMU, MMBench, DocVQA y MMLongBench. Este sistema marca una transición hacia el razonamiento visual autónomo, mejorando la inteligencia visual.

Orion est un nouveau cadre d'agent visuel capable de traiter et de générer diverses modalités. Il utilise un cadre agentique avec plusieurs capacités d'appel d'outils, atteignant des résultats de pointe dans les tâches d'IA visuelle. Contrairement aux modèles traditionnels de vision-langage, Orion utilise des outils de vision par ordinateur spécialisés pour des flux de travail visuels complexes, obtenant des performances compétitives sur des benchmarks tels que MMMU, MMBench, DocVQA et MMLongBench. Ce système marque un tournant vers le raisonnement visuel autonome, améliorant l'intelligence vi…

Orion is a newly introduced visual agent framework capable of processing and generating various modalities. It employs an agentic framework with multiple tool-calling capabilities, achieving state-of-the-art results in visual AI tasks. Unlike traditional vision-language models, Orion utilizes specialized computer vision tools for complex visual workflows, achieving competitive performance on benchmarks like MMMU, MMBench, DocVQA, and MMLongBench. This system marks a shift towards autonomous visual reasoning, enhancing visual intelligence.

RedDiffuser: Red Teaming Vision-Language Models for Toxic Continuation via Reinforced Stable Diffusion

Was this article worth reading? Share it