Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
The development of a Multimodal Large Language Model (MLLM) for the Basque language represents a significant step forward in addressing the challenges faced by low-resource languages in the AI landscape. The study reveals that even a small proportion of Basque multimodal data—around 20%—is sufficient to achieve strong performance on relevant benchmarks. This finding challenges previous assumptions that a Basque-instructed backbone model is essential for success. By utilizing the Llama-3.1-Instruct model and a Basque-adapted variant called Latxa, the researchers explored various data mixtures, ultimately demonstrating that effective MLLMs can be developed without extensive resources. The implications of this research extend beyond Basque, as the authors emphasize the potential for similar methodologies to be applied to other low-resource languages, thereby fostering greater inclusivity and diversity in AI applications. The open release of their resources further supports the collaborati…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it