Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training

arXiv — cs.LG•Wednesday, January 14, 2026 at 5:00:00 AM

PositiveArtificial Intelligence

Qalb has been introduced as the largest state-of-the-art Urdu large language model, developed to address the underrepresentation of Urdu in modern natural language processing (NLP) systems. This model utilizes a two-stage approach involving continued pre-training on a dataset of 1.97 billion tokens, which includes diverse Urdu texts and English Wikipedia data.
The development of Qalb is significant as it aims to enhance the performance of NLP tasks in Urdu, a language spoken by over 230 million people, thereby improving accessibility and representation in AI technologies.
This advancement highlights ongoing efforts to improve language models' capabilities, particularly for underrepresented languages, and aligns with broader trends in AI research focused on enhancing model honesty and reducing hallucinations, as seen in recent studies on large language models.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

LucidQuery AI

Combines diffusion reasoning with autoregressive LLM for advanced AI analysis.

AI & DataView app details

Llanai

Master a new language with personalized AI lessons tailored to your learning style.

Lifestyle & HealthView app details

Palteca

Master a new language with AI-driven lessons based on proven learning methods.

Lifestyle & HealthView app details

PrettyPolly

Practice any language with an AI partner and track your fluency progress.

Lifestyle & HealthView app details

Dubsmart LLC

Multilingual AI dubbing and voice cloning for global video content localization.

AI & DataView app details

OpenL Translator

Instantly translate text from images of signs and menus with accuracy.

AI & DataView app details

Continue Readings

arXiv — cs.LG2 days ago

Incentivizing Multi-Tenant Split Federated Learning for Foundation Models at the Network Edge

PositiveArtificial Intelligence

A novel Price-Incentive Mechanism (PRINCE) has been proposed to enhance Multi-Tenant Split Federated Learning (SFL) for Foundation Models (FMs) like GPT-4, enabling efficient fine-tuning on resource-constrained devices while maintaining privacy. This mechanism addresses the coordination challenges faced by multiple SFL tenants with diverse fine-tuning needs.

Read full article

via arXiv — cs.LG

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about