Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training

arXiv — cs.LGWednesday, January 14, 2026 at 5:00:00 AM
  • Qalb has been introduced as the largest state-of-the-art Urdu large language model, developed to address the underrepresentation of Urdu in modern natural language processing (NLP) systems. This model utilizes a two-stage approach involving continued pre-training on a dataset of 1.97 billion tokens, which includes diverse Urdu texts and English Wikipedia data.
  • The development of Qalb is significant as it aims to enhance the performance of NLP tasks in Urdu, a language spoken by over 230 million people, thereby improving accessibility and representation in AI technologies.
  • This advancement highlights ongoing efforts to improve language models' capabilities, particularly for underrepresented languages, and aligns with broader trends in AI research focused on enhancing model honesty and reducing hallucinations, as seen in recent studies on large language models.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Incentivizing Multi-Tenant Split Federated Learning for Foundation Models at the Network Edge
PositiveArtificial Intelligence
A novel Price-Incentive Mechanism (PRINCE) has been proposed to enhance Multi-Tenant Split Federated Learning (SFL) for Foundation Models (FMs) like GPT-4, enabling efficient fine-tuning on resource-constrained devices while maintaining privacy. This mechanism addresses the coordination challenges faced by multiple SFL tenants with diverse fine-tuning needs.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about