arXiv:2512.03383v1 Announce Type: new 
Abstract: Deploying large language model (LLM) models on mobile platforms faces significant challenges due to the limited memory and shared computational resources of the device. Resource availability may be an issue as it is directly impacted by the current device workload, adding to the uncertainty of model deployment. We introduce UniQL, a unified post-training quantization and low-rank compression framework with on-device configurable pruning rates for edge LLMs. UniQL is a general framework that integrates quantization and low-rank compression for Transformers, State Space Models (SSMs), and hybrid models to support diverse edge applications. In our proposed joint framework, we introduce an efficient structured weight-sorting method that speeds up computation by 20x, quantization-aware singular value decomposition (SVD) to minimize quantization errors, state-aware weight sorting for SSMs, and a fused rotary positional embedding (RoPE) kernel for pruned models. Our framework performs weight-sorting, fine-tuning, and quantization in the cloud in a single-pass workflow, while enabling on-device configurable pruning rates up to 35%. Our experiments show that quantized and pruned models achieve a memory reduction of 4x-5.7x and a token-throughput improvement of 2.7x-3.4x, maintaining accuracy within 5% of the original models at 15% pruning across Transformers (Llama3 and Qwen2.5), SSMs (Mamba2), and hybrid models (Nemotron-H and Bamba-v2). The code and quantized models are available at: https://github.com/enyac-group/UniQL.

تم تقديم UniQL كإطار موحد للتكميم بعد التدريب وضغط الرتبة المنخفضة، مصمم خصيصًا لنشر نماذج اللغة الكبيرة (LLMs) على المنصات المحمولة. يتناول هذا الإطار التحديات التي تطرحها الذاكرة المحدودة والموارد الحاسوبية على الأجهزة، مما يسمح بمعدلات تقليم قابلة للتكوين تتناسب مع التطبيقات الطرفية.

UniQL se ha presentado como un marco unificado para la cuantificación post-entrenamiento y la compresión de bajo rango, diseñado específicamente para implementar grandes modelos de lenguaje (LLMs) en plataformas móviles. Este marco aborda los desafíos que presentan la memoria limitada y los recursos computacionales en los dispositivos, permitiendo tasas de poda configurables adaptadas a aplicaciones en el borde.

UniQL a été introduit comme un cadre unifié pour la quantification post-formation et la compression à faible rang, spécifiquement conçu pour le déploiement de grands modèles de langage (LLMs) sur des plateformes mobiles. Ce cadre répond aux défis posés par la mémoire limitée et les ressources informatiques sur les appareils, permettant des taux de taille configurables adaptés aux applications de périphérie.

UniQL has been introduced as a unified framework for post-training quantization and low-rank compression, specifically designed for deploying large language models (LLMs) on mobile platforms. This framework addresses the challenges posed by limited memory and computational resources on devices, allowing for configurable pruning rates tailored to edge applications.

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

arXiv:2508.12726v5 Announce Type: replace 
Abstract: Large language models (LLMs) have achieved remarkable success in many natural language tasks but still struggle with complex, multi-step reasoning, particularly across diverse disciplines. Existing reasoning datasets often lack disciplinary breadth, reasoning depth, and diversity, as well as guiding principles for question synthesis. We propose DESIGNER: a DESIGN-logic-guidEd Reasoning data synthesis pipeline that leverages naturally available, extensive raw documents (e.g., book corpus and web corpus) to generate multidisciplinary challenging questions. We introduce the concept of "design logic" and instruct LLMs to mimic human educators' question-creation process, enabling the automated synthesis of large-scale, high-difficulty questions. We use LLMs to reverse-engineer and abstract over 120,000 design logics from existing questions across various disciplines. By matching these design logics with source documents, we are able to generate reasoning questions with controllable question types and difficulty levels. Using this pipeline, we synthesized two large-scale reasoning datasets that span 75 disciplines: DLR-Book (3.04 million questions from the book corpus) and DLR-Web (1.66 million questions from the web corpus). Data analysis indicates that the questions synthesized by our method exhibit greater difficulty and diversity compared to those in the baseline datasets. We validate our synthesized data through supervised fine-tuning (SFT) on the Qwen3 and Llama3 model families. Our data substantially enhances their multidisciplinary reasoning capabilities, outperforming existing datasets. Notably, by applying SFT on the base versions of these models using only our data, we even surpass their official final models that have undergone the full post-training process.

تم تقديم DESIGNER مؤخرًا، وهو خط أنابيب لتوليد بيانات مدفوعة بمنطق التصميم، بهدف تعزيز قدرات نماذج اللغة الكبيرة (LLMs) في معالجة الأسئلة المعقدة والمتعددة التخصصات. من خلال الاستفادة من مستندات خام واسعة، يقوم DESIGNER بإنشاء أسئلة عالية الصعوبة تتحدى قدرات التفكير لدى نماذج اللغة الكبيرة عبر مجالات مختلفة.

La reciente introducción de DESIGNER, un pipeline de síntesis de datos guiado por la lógica de diseño, tiene como objetivo mejorar las capacidades de los grandes modelos de lenguaje (LLMs) para abordar preguntas complejas y multidisciplinarias. Al aprovechar documentos en bruto extensos, DESIGNER genera preguntas de alta dificultad que desafían las habilidades de razonamiento de los LLMs en diversas disciplinas.

La récente introduction de DESIGNER, un pipeline de synthèse de données guidé par la logique de conception, vise à améliorer les capacités des grands modèles de langage (LLMs) à traiter des questions complexes et multidisciplinaires. En s'appuyant sur des documents bruts étendus, DESIGNER génère des questions de haute difficulté qui mettent au défi les capacités de raisonnement des LLMs dans divers domaines.

The recent introduction of DESIGNER, a design-logic-guided reasoning data synthesis pipeline, aims to enhance the capabilities of large language models (LLMs) in tackling complex, multidisciplinary questions. By leveraging extensive raw documents, DESIGNER generates high-difficulty questions that challenge LLMs' reasoning abilities across various disciplines.

DESIGNER: Design-Logic-Guided Multidisciplinary Data Synthesis for LLM Reasoning

arXiv:2512.02161v1 Announce Type: new 
Abstract: Text-to-image (T2I) models are capable of generating visually impressive images, yet they often fail to accurately capture specific attributes in user prompts, such as the correct number of objects with the specified colors. The diversity of such errors underscores the need for a hierarchical evaluation framework that can compare prompt adherence abilities of different image generation models. Simultaneously, benchmarks of vision language models (VLMs) have not kept pace with the complexity of scenes that VLMs are used to annotate. In this work, we propose a structured methodology for jointly evaluating T2I models and VLMs by testing whether VLMs can identify 27 specific failure modes in the images generated by T2I models conditioned on challenging prompts. Our second contribution is a dataset of prompts and images generated by 5 T2I models (Flux, SD3-Medium, SD3-Large, SD3.5-Medium, SD3.5-Large) and the corresponding annotations from VLMs (Molmo, InternVL3, Pixtral) annotated by an LLM (Llama3) to test whether VLMs correctly identify the failure mode in a generated image. By analyzing failure modes on a curated set of prompts, we reveal systematic errors in attribute fidelity and object representation. Our findings suggest that current metrics are insufficient to capture these nuanced errors, highlighting the importance of targeted benchmarks for advancing generative model reliability and interpretability.

قدمت FineGRAIN منهجية منظمة لتقييم أنماط الفشل في نماذج تحويل النص إلى صورة (T2I) باستخدام نماذج اللغة البصرية (VLM) كحكام. يهدف هذا النهج إلى تحديد الأخطاء المحددة في توليد الصور، مثل عدم الدقة في عدد الأجسام والألوان، من خلال اختبار 27 نمط فشل عبر خمسة نماذج T2I، بما في ذلك Flux وإصدارات مختلفة من SD3.

FineGRAIN ha introducido una metodología estructurada para evaluar los modos de falla en los modelos de generación de imágenes a partir de texto (T2I) utilizando modelos de lenguaje visual (VLM) como jueces. Este enfoque busca identificar errores específicos en la generación de imágenes, como inexactitudes en el conteo y color de objetos, al probar 27 modos de falla en cinco modelos T2I, incluyendo Flux y varias versiones de SD3.

FineGRAIN a introduit une méthodologie structurée pour évaluer les modes de défaillance des modèles de génération d'images à partir de texte (T2I) en utilisant des modèles de langage visuel (VLM) comme juges. Cette approche vise à identifier des erreurs spécifiques dans la génération d'images, telles que des inexactitudes dans le nombre d'objets et la couleur, en testant 27 modes de défaillance à travers cinq modèles T2I, y compris Flux et diverses versions de SD3.

FineGRAIN has introduced a structured methodology to evaluate failure modes in text-to-image (T2I) models using vision language models (VLMs) as judges. This approach aims to identify specific errors in image generation, such as inaccuracies in object count and color, by testing 27 failure modes across five T2I models, including Flux and various versions of SD3.

UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

Was this article worth reading? Share it

LucidQuery AI

Langtail

Dynamiq