ReFineG: Synergizing Small Supervised Models and LLMs for Low-Resource Grounded Multimodal NER

arXiv — cs.CLThursday, November 13, 2025 at 5:00:00 AM
The recent publication of 'ReFineG: Synergizing Small Supervised Models and LLMs for Low-Resource Grounded Multimodal NER' introduces a three-stage framework designed to improve Grounded Multimodal Named Entity Recognition (GMNER) in low-resource settings. Traditional methods often struggle due to the need for costly multimodal annotations and can underperform in specific domains. ReFineG addresses these issues by combining small supervised models with frozen multimodal large language models (MLLMs). The framework includes a domain-aware data synthesis strategy, an uncertainty-based refinement mechanism, and a multimodal context selection algorithm. This approach not only enhances the accuracy of entity recognition but also allows for effective visual grounding. The framework's effectiveness was validated when it secured the second position in the CCKS2025 GMNER Shared Task, achieving an F1 score of 0.6461, showcasing its potential for practical applications in the field.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Unifying Segment Anything in Microscopy with Vision-Language Knowledge
PositiveArtificial Intelligence
The paper titled 'Unifying Segment Anything in Microscopy with Vision-Language Knowledge' discusses the importance of accurate segmentation in biomedical images. It highlights the limitations of existing models in handling unseen domain data due to a lack of vision-language knowledge. The authors propose a new framework, uLLSAM, which utilizes Multimodal Large Language Models (MLLMs) to enhance segmentation performance. This approach aims to improve generalization capabilities across cross-domain datasets, achieving notable performance improvements.