arXiv:2510.11341v2 Announce Type: replace 
Abstract: General SVG modeling remains challenging due to fragmented datasets, limited transferability of methods across tasks, and the difficulty of handling structural complexity. In response, we leverage the strong transfer and generalization capabilities of multimodal large language models (MLLMs) to achieve unified modeling for SVG understanding, editing, and generation. We present the InternSVG family, an integrated data-benchmark-model suite. At its core is SAgoge, the largest and most comprehensive multimodal dataset for SVG tasks, encompassing both static graphics and dynamic animations. It covers icons, long-sequence illustrations, scientific diagrams, and dynamic animations, supporting tasks of varied difficulty levels and providing deeper hierarchies with richer attributes compared to previous datasets. Based on this resource, we introduce SArena, a companion benchmark with comprehensive task definitions and standardized evaluation that aligns with the domains and difficulty spectrum covered by SAgoge. Building on these foundations, we propose InternSVG, a unified MLLM for SVG understanding, editing, and generation with SVG-specific special tokens, subword-based embedding initialization, and a two-stage training strategy that progresses from short static SVGs to long-sequence illustrations and complex animations. This unified formulation induces positive transfer and improves overall performance. Experiments on SArena and prior benchmark confirm that InternSVG achieves substantial gains and consistently outperforms leading open and proprietary counterparts.

InternSVG هي مبادرة رائدة تهدف إلى تبسيط نمذجة SVG من خلال استخدام نماذج اللغة متعددة الوسائط. تتناول هذه الطريقة التحديات المتعلقة بمجموعات البيانات المجزأة وتعزز قابلية نقل الأساليب عبر مهام مختلفة. مع تقديم عائلة InternSVG، يمكن للمستخدمين توقع تجربة أكثر توحيدًا في فهم وتحرير وتوليد محتوى SVG.

InternSVG es una iniciativa innovadora que busca simplificar la modelización SVG mediante el uso de modelos de lenguaje multimodal. Este enfoque aborda los desafíos de los conjuntos de datos fragmentados y mejora la transferibilidad de los métodos en diversas tareas. Con la introducción de la familia InternSVG, los usuarios pueden esperar una experiencia más unificada en la comprensión, edición y generación de contenido SVG.

InternSVG est une initiative révolutionnaire qui vise à simplifier la modélisation SVG en utilisant des modèles de langage multimodaux. Cette approche répond aux défis des ensembles de données fragmentés et améliore la transférabilité des méthodes à travers diverses tâches. Avec l'introduction de la famille InternSVG, les utilisateurs peuvent s'attendre à une expérience plus unifiée dans la compréhension, l'édition et la génération de contenu SVG.

InternSVG is a groundbreaking initiative that aims to simplify SVG modeling by utilizing multimodal large language models. This approach addresses the challenges of fragmented datasets and enhances the transferability of methods across various tasks. With the introduction of the InternSVG family, users can expect a more unified experience in understanding, editing, and generating SVG content.

InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

Was this article worth reading? Share it

Ready to build your own newsroom?