Parameter Efficient Multimodal Instruction Tuning for Romanian Vision Language Models

arXiv — cs.LGThursday, December 18, 2025 at 5:00:00 AM
  • A new study has introduced parameter-efficient multimodal instruction tuning for Romanian vision language models, focusing on bridging the resource gap in generative AI for low-resource languages. The research includes translating the Flickr30k dataset into Romanian and enhancing it for visual question answering using open-source large language models (LLMs) like LLaMA, LLaVA, and Qwen2, employing the LoRA method for fine-tuning.
  • This development is significant as it enhances the capabilities of Romanian visual language models in tasks such as visual question answering and image description generation, showcasing improvements in performance metrics like BERTScore F1. The successful adaptation of these models indicates a step forward in making advanced AI technologies more accessible to speakers of low-resource languages.
  • The advancements in multimodal instruction tuning reflect a broader trend in AI research aimed at improving the performance of vision-language models across various languages. The introduction of innovative methods like LoRA and other adaptations emphasizes the ongoing efforts to enhance model efficiency and effectiveness, addressing challenges in multimodal knowledge retrieval and the need for better fine-tuning strategies in diverse linguistic contexts.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
ClimateIQA: A New Dataset and Benchmark to Advance Vision-Language Models in Meteorology Anomalies Analysis
PositiveArtificial Intelligence
A new dataset named ClimateIQA has been introduced to enhance the capabilities of Vision-Language Models (VLMs) in analyzing meteorological anomalies. This dataset, which includes 26,280 high-quality images, aims to address the challenges faced by existing models like GPT-4o and Qwen-VL in interpreting complex meteorological heatmaps characterized by irregular shapes and color variations.
Tuning-free Visual Effect Transfer across Videos
PositiveArtificial Intelligence
A new framework named RefVFX has been introduced, enabling the transfer of complex temporal effects from a reference video to a target video or image in a feed-forward manner. This innovation addresses challenges in dynamic temporal effects, such as lighting changes and character transformations, which are difficult to articulate through text or static conditions.
Qalb: Largest State-of-the-Art Urdu Large Language Model for 230M Speakers with Systematic Continued Pre-training
PositiveArtificial Intelligence
Qalb has been introduced as the largest state-of-the-art Urdu large language model, developed to address the underrepresentation of Urdu in modern natural language processing (NLP) systems. This model utilizes a two-stage approach involving continued pre-training on a dataset of 1.97 billion tokens, which includes diverse Urdu texts and English Wikipedia data.
Incentivizing Multi-Tenant Split Federated Learning for Foundation Models at the Network Edge
PositiveArtificial Intelligence
A novel Price-Incentive Mechanism (PRINCE) has been proposed to enhance Multi-Tenant Split Federated Learning (SFL) for Foundation Models (FMs) like GPT-4, enabling efficient fine-tuning on resource-constrained devices while maintaining privacy. This mechanism addresses the coordination challenges faced by multiple SFL tenants with diverse fine-tuning needs.
Decentralized Autoregressive Generation
NeutralArtificial Intelligence
A theoretical analysis of decentralization in autoregressive generation has been presented, introducing the Decentralized Discrete Flow Matching objective, which expresses probability generating velocity as a linear combination of expert flows. Experiments demonstrate the equivalence between decentralized and centralized training settings for multimodal language models, specifically comparing LLaVA and InternVL 2.5-1B.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about