Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training
PositiveArtificial Intelligence
- Qalb has been introduced as the largest state-of-the-art Urdu large language model, developed to address the underrepresentation of Urdu in modern natural language processing (NLP) systems. This model utilizes a two-stage approach involving continued pre-training on a dataset of 1.97 billion tokens, which includes diverse Urdu texts and English Wikipedia data.
- The development of Qalb is significant as it aims to enhance the performance of NLP tasks in Urdu, a language spoken by over 230 million people, thereby improving accessibility and representation in AI technologies.
- This advancement highlights ongoing efforts to improve language models' capabilities, particularly for underrepresented languages, and aligns with broader trends in AI research focused on enhancing model honesty and reducing hallucinations, as seen in recent studies on large language models.
— via World Pulse Now AI Editorial System
