Evaluating Multimodal Large Language Models on Vertically Written Japanese Text

arXiv — cs.CVThursday, November 20, 2025 at 5:00:00 AM
  • The study assesses the capabilities of Multimodal Large Language Models (MLLMs) in reading vertically written Japanese text, addressing a gap in existing research.
  • This evaluation is crucial as it seeks to improve the processing of diverse document formats, enhancing the usability of MLLMs in multilingual contexts, particularly for Japanese.
  • The exploration of MLLMs in this context reflects broader trends in AI, where the focus is shifting towards improving model performance across various languages and writing styles, highlighting the need for more inclusive datasets.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image
PositiveArtificial Intelligence
Wonder3D++ is a new method designed to generate high-fidelity textured meshes from single-view images. It addresses limitations in existing techniques that either require extensive optimization or yield low-quality results. By employing a cross-domain diffusion model and a multi-view attention mechanism, Wonder3D++ enhances the quality and consistency of 3D reconstructions, making it a significant advancement in the field of 3D generation.
CompAgent: An Agentic Framework for Visual Compliance Verification
PositiveArtificial Intelligence
CompAgent is a newly proposed framework aimed at enhancing visual compliance verification in computer vision, particularly within media and advertising sectors. It addresses the limitations of existing methods that rely on deep learning models trained on manually labeled datasets. By integrating Multimodal Large Language Models (MLLMs) with various visual tools, CompAgent aims to improve the reasoning and application of compliance rules in visual content.
Learning from Mistakes: Loss-Aware Memory Enhanced Continual Learning for LiDAR Place Recognition
PositiveArtificial Intelligence
LiDAR place recognition is essential for SLAM, robot navigation, and autonomous driving. Current methods often face catastrophic forgetting when adapting to new environments. To combat this, a new framework called KDF+ has been proposed, which incorporates a loss-aware sampling strategy and a rehearsal enhancement mechanism to improve continual learning in LiDAR place recognition.
Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models
PositiveArtificial Intelligence
Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in tasks such as OCR and VQA, but hallucination remains a significant challenge. This paper is the first to explore verb hallucination in MLLMs, revealing that many state-of-the-art models exhibit severe issues with verb concepts. The study evaluates existing methods aimed at reducing hallucinations related to object concepts and assesses their effectiveness on verb hallucinations.
Controlling False Positives in Image Segmentation via Conformal Prediction
PositiveArtificial Intelligence
A new framework for controlling false positives in image segmentation has been introduced, enhancing the reliability of semantic segmentation in clinical decision-making. This model-agnostic approach utilizes conformal prediction to create confidence masks that maintain a user-defined tolerance for false positives, without requiring retraining. The method demonstrates high probability guarantees for new images, making it a significant advancement in medical imaging.
H-CNN-ViT: A Hierarchical Gated Attention Multi-Branch Model for Bladder Cancer Recurrence Prediction
PositiveArtificial Intelligence
Bladder cancer, with a recurrence rate of up to 78%, poses significant challenges for post-operative monitoring. Traditional multi-sequence contrast-enhanced MRI scans are often difficult to interpret due to changes from surgery. This study introduces H-CNN-ViT, a new AI model designed to enhance bladder cancer recurrence prediction by utilizing a curated multi-sequence MRI dataset, which aims to improve diagnostic accuracy and patient management.
Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data
PositiveArtificial Intelligence
Supervised Fine-Tuning (SFT) is essential for adapting Large Language Models (LLMs) to specialized fields like medical reasoning. Current SFT methods often utilize unfiltered datasets, which can be redundant and of low quality, leading to high computational costs and poor performance. This study introduces a new data selection strategy called Difficulty-Influence Quadrant (DIQ), which aims to optimize sample selection based on both difficulty and optimization utility, enhancing the efficiency of medical reasoning applications.
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
NeutralArtificial Intelligence
MoHoBench is a newly developed benchmark aimed at assessing the honesty of Multimodal Large Language Models (MLLMs) when confronted with unanswerable visual questions. Despite advancements in vision-language tasks, MLLMs often produce unreliable content. This study systematically evaluates the honesty of 28 popular MLLMs using a dataset of over 12,000 visual questions, revealing that many models struggle to provide honest responses. The findings highlight the need for improved trustworthiness in AI systems.