From Scratch to Fine-Tuned: A Comparative Study of Transformer Training Strategies for Legal Machine Translation

arXiv — cs.CLTuesday, December 23, 2025 at 5:00:00 AM
  • A recent study has demonstrated the effectiveness of Transformer-based approaches in Legal Machine Translation (L-MT) for English-Hindi translation, addressing language barriers in India's legal system. The research, part of the JUST-NLP 2025 shared task, involved fine-tuning a pre-trained OPUS-MT model and training a model from scratch, with the fine-tuned model achieving a SacreBLEU score of 46.03, significantly outperforming the baseline.
  • This development is crucial as it enhances access to legal information for non-English speakers in India, potentially transforming the legal landscape by making judicial documentation more accessible and understandable. The successful implementation of L-MT could lead to broader applications in various legal contexts, improving communication and understanding in legal proceedings.
  • The study highlights ongoing efforts to leverage AI in the legal domain, paralleling other initiatives in India aimed at enhancing legal AI capabilities, such as judgment prediction and document summarization. These advancements reflect a growing recognition of the importance of AI in bridging language gaps and improving the efficiency of legal processes in multilingual societies.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
India’s Emversity doubles valuation as it scales workers AI can’t replace
PositiveArtificial Intelligence
Emversity, an Indian startup focused on job-ready training, has successfully raised $30 million in a new funding round, doubling its valuation as it aims to scale its operations in a market increasingly focused on skills that artificial intelligence cannot replace.
First-ever dataset to improve English-to-Malayalam machine translation fills critical gap for low-resource languages
PositiveArtificial Intelligence
Researchers at the University of Surrey have developed the world's first dataset designed to enhance English-to-Malayalam machine translation, addressing a significant gap for this low-resource language spoken by over 38 million people in India.
IndRegBias: A Dataset for Studying Indian Regional Biases in English and Code-Mixed Social Media Comments
NeutralArtificial Intelligence
A new dataset named IndRegBias has been introduced to study regional biases in English and code-mixed comments on social media platforms like Reddit and YouTube, focusing on Indian contexts. This dataset comprises 25,000 comments that reflect regional biases, which have been less explored compared to other social biases such as gender and race.
Edge-AI Perception Node for Cooperative Road-Safety Enforcement and Connected-Vehicle Integration
PositiveArtificial Intelligence
A new study presents an Edge-AI perception node designed for real-time traffic violation analytics and safety event dissemination in India, addressing the challenges posed by rapid motorization and a significant enforcement gap, with over 11 million violations recorded in 2023.
Why India’s plan to make AI companies pay for training data should go global
PositiveArtificial Intelligence
India is proposing a licensing fee for AI companies that utilize copyrighted data for training, aiming to ensure creators are compensated and to reduce legal disputes. This initiative reflects a growing recognition of the need to protect intellectual property in the rapidly evolving AI landscape.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about