World PulseNowPowered by AI

Trending:

Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation

arXiv — cs.LG•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new pipeline has been developed to automate natural
This advancement is significant as it promises timely and consistent guidance, potentially transforming how surgical skills are taught and assessed.
The integration of AI in medical training reflects a broader trend towards utilizing technology to improve educational outcomes, paralleling efforts in other fields such as medical image segmentation and skill assessment.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings

Investigating Hallucination in Conversations for Low Resource Languages

arXiv — cs.CL7 hours ago

Investigating Hallucination in Conversations for Low Resource Languages

NeutralArtificial Intelligence

Large Language Models (LLMs) have shown exceptional ability in text generation but often produce factually incorrect statements, known as 'hallucinations'. This study investigates hallucinations in conversational data across three low-resource languages: Hindi, Farsi, and Mandarin. The analysis of various LLMs, including GPT-3.5 and GPT-4o, reveals that while Mandarin has few hallucinated responses, Hindi and Farsi exhibit significantly higher rates of inaccuracies.

Read full article

via arXiv — cs.CL

Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study

arXiv — cs.CLa day ago

Evaluating Large Language Models for Diacritic Restoration in Romanian Texts: A Comparative Study

PositiveArtificial Intelligence

This study evaluates the effectiveness of various large language models (LLMs) in restoring diacritics in Romanian texts, a crucial task for text processing in languages with rich diacritical marks. The models tested include OpenAI's GPT-3.5, GPT-4, Google's Gemini 1.0 Pro, and Meta's Llama family, among others. Results indicate that GPT-4o achieves high accuracy in diacritic restoration, outperforming a neutral baseline, while other models show variability. The findings emphasize the importance of model architecture, training data, and prompt design in enhancing natural language processing to…

Read full article

via arXiv — cs.CL

LLM-as-a-Grader: Practical Insights from Large Language Model for Short-Answer and Report Evaluation

arXiv — cs.CLa day ago

LLM-as-a-Grader: Practical Insights from Large Language Model for Short-Answer and Report Evaluation

PositiveArtificial Intelligence

This study explores the use of Large Language Models (LLMs), specifically GPT-4o, for evaluating short-answer quizzes and project reports in an undergraduate Computational Linguistics course. The research involved approximately 50 students and 14 project teams, comparing LLM-generated scores with human evaluations from teaching assistants. Results indicated a strong correlation between LLM and human scores, achieving up to 0.98 correlation and exact score agreement in 55% of quiz cases, while showing variability in scoring open-ended responses.

Read full article

via arXiv — cs.CL

Scene Graph-Guided Generative AI Framework for Synthesizing and Evaluating Industrial Hazard Scenarios

arXiv — cs.CVa day ago

Scene Graph-Guided Generative AI Framework for Synthesizing and Evaluating Industrial Hazard Scenarios

PositiveArtificial Intelligence

A new study introduces a scene graph-guided generative AI framework aimed at synthesizing realistic images of industrial hazard scenarios. This framework addresses the challenge of acquiring datasets for workplace hazards, which are difficult to capture in real-time. By analyzing historical Occupational Safety and Health Administration (OSHA) accident reports with GPT-4o, the study extracts structured hazard reasoning and creates object-level scene graphs. These graphs are utilized to guide a text-to-image diffusion model, generating accurate hazard scenes for evaluation.

Read full article

via arXiv — cs.CV

CARScenes: Semantic VLM Dataset for Safe Autonomous Driving

arXiv — cs.CVa day ago

CARScenes: Semantic VLM Dataset for Safe Autonomous Driving

PositiveArtificial Intelligence

CAR-Scenes is a frame-level dataset designed for autonomous driving, facilitating the training and evaluation of vision-language models (VLMs) for scene-level understanding. The dataset comprises 5,192 annotated images from sources like Argoverse, Cityscapes, KITTI, and nuScenes, utilizing a comprehensive 28-key category/sub-category knowledge base. The annotations are generated through a GPT-4o-assisted pipeline with human verification, providing detailed attributes and supporting semantic retrieval and risk-aware scenario mining.

Read full article

via arXiv — cs.CV

UniSER: A Foundation Model for Unified Soft Effects Removal

arXiv — cs.CVa day ago

UniSER: A Foundation Model for Unified Soft Effects Removal

PositiveArtificial Intelligence

The paper introduces UniSER, a foundational model designed for the unified removal of soft effects in digital images, such as lens flare, haze, shadows, and reflections. These effects often degrade image aesthetics while leaving underlying pixels visible. Existing solutions typically focus on individual issues, leading to specialized models that lack scalability. In contrast, UniSER leverages the commonality of semi-transparent occlusions to effectively address various soft effect degradations, enhancing image restoration capabilities beyond current generalist models that require detailed prom…

Read full article

via arXiv — cs.CV

Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding

arXiv — cs.CV2 days ago

Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding

PositiveArtificial Intelligence

Multimodal Large Language Models (MLLMs) have demonstrated significant cross-modal capabilities but continue to struggle with hallucinations. To address this issue, VBackChecker has been introduced as a reference-free hallucination detection framework. This framework verifies the consistency of MLLM-generated responses with visual inputs using a pixel-level Grounding LLM that incorporates reasoning and segmentation capabilities. Additionally, a new pipeline for generating instruction-tuning data, R-Instruct, has been developed, enhancing interpretability and handling rich-context scenarios eff…

Read full article

via arXiv — cs.CV