arXiv:2510.10072v2 Announce Type: replace 
Abstract: Reasoning-focused large language models (LLMs) are rapidly evolving across various domains, yet their capabilities in handling complex legal problems remains underexplored. In this paper, we introduce Unilaw-R1, a large language model tailored for legal reasoning. With a lightweight 7-billion parameter scale, Unilaw-R1 significantly reduces deployment cost while effectively tackling three core challenges in the legal domain: insufficient legal knowledge, unreliable reasoning logic, and weak business generalization. To address these issues, we first construct Unilaw-R1-Data, a high-quality dataset containing 17K distilled and screened chain-of-thought (CoT) samples. Based on this, we adopt a two-stage training strategy combining Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), which significantly boosts the performance on complex legal reasoning tasks and supports interpretable decision-making in legal AI applications. To assess legal reasoning ability, we also introduce Unilaw-R1-Eval, a dedicated benchmark designed to evaluate models across single- and multi-choice legal tasks. Unilaw-R1 demonstrates strong results on authoritative benchmarks, outperforming all models of similar scale and achieving performance on par with the much larger DeepSeek-R1-Distill-Qwen-32B (54.9%). Following domain-specific training, it also showed significant gains on LawBench and LexEval, exceeding Qwen-2.5-7B-Instruct (46.6%) by an average margin of 6.6%.

تم تقديم Unilaw-R1 كنموذج لغوي كبير مصمم خصيصًا للتفكير القانوني، مع بنية خفيفة الوزن تحتوي على 7 مليارات معلمة. يتناول هذا النموذج التحديات الحرجة في المجال القانوني، بما في ذلك نقص المعرفة القانونية، وعدم موثوقية المنطق، وضعف التعميم التجاري، مدعومًا بمجموعة بيانات عالية الجودة تحتوي على 17,000 عينة من سلسلة التفكير واستراتيجية تدريب من مرحلتين تجمع بين التعديل الخاضع للإشراف والتعلم المعزز.

Se ha presentado Unilaw-R1 como un modelo de lenguaje de gran tamaño diseñado específicamente para el razonamiento legal, con una arquitectura ligera de 7 mil millones de parámetros. Este modelo aborda desafíos críticos en el campo legal, incluyendo el conocimiento legal insuficiente, el razonamiento poco fiable y la débil generalización comercial, respaldado por un conjunto de datos de alta calidad de 17,000 muestras de cadena de pensamiento y una estrategia de entrenamiento en dos etapas que combina el ajuste fino supervisado y el aprendizaje por refuerzo.

Unilaw-R1 a été introduit comme un modèle de langage de grande taille spécifiquement conçu pour le raisonnement juridique, avec une architecture légère de 7 milliards de paramètres. Ce modèle répond à des défis critiques dans le domaine juridique, notamment des connaissances juridiques insuffisantes, un raisonnement peu fiable et une mauvaise généralisation commerciale, soutenu par un ensemble de données de haute qualité de 17 000 échantillons de chaîne de pensée et une stratégie de formation en deux étapes impliquant un affinage supervisé et un apprentissage par renforcement.

Unilaw-R1 has been introduced as a large language model specifically designed for legal reasoning, featuring a lightweight architecture with 7 billion parameters. This model addresses critical challenges in the legal field, including inadequate legal knowledge, unreliable reasoning, and poor business generalization, supported by a high-quality dataset of 17,000 chain-of-thought samples and a two-stage training strategy involving Supervised Fine-Tuning and Reinforcement Learning.

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

One More Thing in AI – Your Shortcut to AI Mastery

Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference

Was this article worth reading? Share it

One More Thing in AI

LucidQuery AI

Supametas.AI

AskLegal

LangWatch

SingleDraft

Ready to build your own newsroom?