Parameter Aware Mamba Model for Multi-task Dense Prediction

arXiv — cs.CV•Wednesday, November 19, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

The Parameter Aware Mamba Model (PAMM) has been developed to improve multi
The introduction of PAMM signifies a shift towards more sophisticated methods in multi

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV20 hours ago

Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent

PositiveArtificial Intelligence

Deep Equilibrium Models (DEQs) are implicit neural networks that have recently been applied to image regularization, particularly in Gaussian fidelity contexts. This study extends DEQs to Poisson inverse problems, utilizing the Kullback–Leibler divergence for data fidelity. A novel DEQ formulation based on Mirror Descent is introduced, adapting to the data term's structure. The research establishes sufficient conditions and convergence results using the Kurdyka–Lojasiewicz framework for subanalytic functions.

Read full article

via arXiv — cs.CV

arXiv — cs.LG20 hours ago

What happens when nanochat meets DiLoCo?

NeutralArtificial Intelligence

The article discusses the integration of the DiLoCo algorithm with the nanochat project, a compact implementation similar to ChatGPT. This integration aims to enhance training efficiency in distributed environments where communication is constrained. By applying DiLoCo as a lightweight wrapper around nanochat's training loop, the researchers can significantly reduce communication overhead by allowing multiple local training steps before synchronization. This approach is compared to a standard data-parallel setup, highlighting the potential for improved model training in resource-limited scenar…

Read full article

via arXiv — cs.LG

arXiv — cs.CV20 hours ago

Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding

PositiveArtificial Intelligence

The article presents a novel training strategy called Curriculum-based Relative Policy Optimization (CuRPO) aimed at improving Visual Grounding tasks. It highlights the limitations of Chain-of-Thought (CoT) prompting, particularly when outputs become lengthy or complex, which can degrade performance. The study reveals that simply increasing dataset size does not guarantee better results due to varying complexities. CuRPO utilizes CoT length and generalized Intersection over Union (gIoU) rewards to structure training data progressively from simpler to more challenging examples, demonstrating ef…

Read full article

via arXiv — cs.CV

arXiv — cs.CV20 hours ago

MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions

NeutralArtificial Intelligence

MoHoBench is a newly developed benchmark aimed at assessing the honesty of Multimodal Large Language Models (MLLMs) when confronted with unanswerable visual questions. Despite advancements in vision-language tasks, MLLMs often produce unreliable content. This study systematically evaluates the honesty of 28 popular MLLMs using a dataset of over 12,000 visual questions, revealing that many models struggle to provide honest responses. The findings highlight the need for improved trustworthiness in AI systems.

Read full article

via arXiv — cs.CV

arXiv — cs.CL20 hours ago

Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data

PositiveArtificial Intelligence

Supervised Fine-Tuning (SFT) is essential for adapting Large Language Models (LLMs) to specialized fields like medical reasoning. Current SFT methods often utilize unfiltered datasets, which can be redundant and of low quality, leading to high computational costs and poor performance. This study introduces a new data selection strategy called Difficulty-Influence Quadrant (DIQ), which aims to optimize sample selection based on both difficulty and optimization utility, enhancing the efficiency of medical reasoning applications.

Read full article

via arXiv — cs.CL

arXiv — cs.CV20 hours ago

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

PositiveArtificial Intelligence

MVI-Bench is introduced as a comprehensive benchmark aimed at evaluating the robustness of Large Vision-Language Models (LVLMs) against misleading visual inputs. Traditional benchmarks have primarily focused on textual inputs, neglecting the significant impact of visual misrepresentation. MVI-Bench categorizes misleading visual inputs into three hierarchical levels: Visual Concept, Visual Attribute, and Visual Relationship, and includes 1,248 annotated Visual Question Answering (VQA) instances to facilitate detailed robustness assessments.

Read full article

via arXiv — cs.CV

arXiv — cs.CV20 hours ago

2D Gaussians Spatial Transport for Point-supervised Density Regression

PositiveArtificial Intelligence

The paper presents Gaussian Spatial Transport (GST), a new framework that utilizes Gaussian splatting to transfer probability measures from image coordinates to annotation maps. It introduces a method for estimating pixel-annotation correspondence, which is used to create a transport plan based on Bayesian probability. A loss function is derived to integrate this transport plan into standard network optimization for computer vision tasks. Experiments in crowd counting and landmark detection demonstrate the approach's effectiveness, improving efficiency by eliminating iterative transport plan c…

Read full article

via arXiv — cs.CV

arXiv — cs.LG20 hours ago

Higher-Order Transformers With Kronecker-Structured Attention

PositiveArtificial Intelligence

The paper introduces the Higher-Order Transformer (HOT), a novel attention framework designed to handle high-dimensional, multiway tensor data. Traditional Transformers struggle with such data due to computational inefficiencies and the need to flatten inputs, which disrupts tensor structures. HOT utilizes Kronecker products to represent multiway attention, efficiently capturing relationships across dimensions while maintaining tensor integrity. Experiments demonstrate HOT's competitive performance on 2D and 3D datasets, retaining the expressiveness of full high-order attention.

Read full article

via arXiv — cs.LG