Evaluating Multimodal Large Language Models on Vertically Written Japanese Text

arXiv — cs.CV•Thursday, November 20, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

The study assesses the capabilities of Multimodal Large Language Models (MLLMs) in reading vertically written Japanese text, addressing a gap in existing research.
This evaluation is crucial as it seeks to improve the processing of diverse document formats, enhancing the usability of MLLMs in multilingual contexts, particularly for Japanese.
The exploration of MLLMs in this context reflects broader trends in AI, where the focus is shifting towards improving model performance across various languages and writing styles, highlighting the need for more inclusive datasets.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV9 hours ago

Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image

PositiveArtificial Intelligence

Wonder3D++ is a new method designed to generate high-fidelity textured meshes from single-view images. It addresses limitations in existing techniques that either require extensive optimization or yield low-quality results. By employing a cross-domain diffusion model and a multi-view attention mechanism, Wonder3D++ enhances the quality and consistency of 3D reconstructions, making it a significant advancement in the field of 3D generation.

Read full article

via arXiv — cs.CV

arXiv — cs.CV9 hours ago

CompAgent: An Agentic Framework for Visual Compliance Verification

PositiveArtificial Intelligence

CompAgent is a newly proposed framework aimed at enhancing visual compliance verification in computer vision, particularly within media and advertising sectors. It addresses the limitations of existing methods that rely on deep learning models trained on manually labeled datasets. By integrating Multimodal Large Language Models (MLLMs) with various visual tools, CompAgent aims to improve the reasoning and application of compliance rules in visual content.

Read full article

via arXiv — cs.CV

arXiv — cs.CV9 hours ago

Learning from Mistakes: Loss-Aware Memory Enhanced Continual Learning for LiDAR Place Recognition

PositiveArtificial Intelligence

LiDAR place recognition is essential for SLAM, robot navigation, and autonomous driving. Current methods often face catastrophic forgetting when adapting to new environments. To combat this, a new framework called KDF+ has been proposed, which incorporates a loss-aware sampling strategy and a rehearsal enhancement mechanism to improve continual learning in LiDAR place recognition.

Read full article

via arXiv — cs.CV

arXiv — cs.CV9 hours ago

Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models

PositiveArtificial Intelligence

Multimodal Large Language Models (MLLMs) have shown remarkable capabilities in tasks such as OCR and VQA, but hallucination remains a significant challenge. This paper is the first to explore verb hallucination in MLLMs, revealing that many state-of-the-art models exhibit severe issues with verb concepts. The study evaluates existing methods aimed at reducing hallucinations related to object concepts and assesses their effectiveness on verb hallucinations.

Read full article

via arXiv — cs.CV

arXiv — cs.LG9 hours ago

Controlling False Positives in Image Segmentation via Conformal Prediction

PositiveArtificial Intelligence

A new framework for controlling false positives in image segmentation has been introduced, enhancing the reliability of semantic segmentation in clinical decision-making. This model-agnostic approach utilizes conformal prediction to create confidence masks that maintain a user-defined tolerance for false positives, without requiring retraining. The method demonstrates high probability guarantees for new images, making it a significant advancement in medical imaging.

Read full article

via arXiv — cs.LG

arXiv — cs.CV9 hours ago

H-CNN-ViT: A Hierarchical Gated Attention Multi-Branch Model for Bladder Cancer Recurrence Prediction

PositiveArtificial Intelligence

Bladder cancer, with a recurrence rate of up to 78%, poses significant challenges for post-operative monitoring. Traditional multi-sequence contrast-enhanced MRI scans are often difficult to interpret due to changes from surgery. This study introduces H-CNN-ViT, a new AI model designed to enhance bladder cancer recurrence prediction by utilizing a curated multi-sequence MRI dataset, which aims to improve diagnostic accuracy and patient management.

Read full article

via arXiv — cs.CV

arXiv — cs.CLa day ago

Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data

PositiveArtificial Intelligence

Supervised Fine-Tuning (SFT) is essential for adapting Large Language Models (LLMs) to specialized fields like medical reasoning. Current SFT methods often utilize unfiltered datasets, which can be redundant and of low quality, leading to high computational costs and poor performance. This study introduces a new data selection strategy called Difficulty-Influence Quadrant (DIQ), which aims to optimize sample selection based on both difficulty and optimization utility, enhancing the efficiency of medical reasoning applications.

Read full article

via arXiv — cs.CL

arXiv — cs.CVa day ago

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

NeutralArtificial Intelligence

MoHoBench is a newly developed benchmark aimed at assessing the honesty of Multimodal Large Language Models (MLLMs) when confronted with unanswerable visual questions. Despite advancements in vision-language tasks, MLLMs often produce unreliable content. This study systematically evaluates the honesty of 28 popular MLLMs using a dataset of over 12,000 visual questions, revealing that many models struggle to provide honest responses. The findings highlight the need for improved trustworthiness in AI systems.

Read full article

via arXiv — cs.CV