Parameter Aware Mamba Model for Multi-task Dense Prediction

arXiv — cs.CVWednesday, November 19, 2025 at 5:00:00 AM
  • The Parameter Aware Mamba Model (PAMM) has been developed to improve multi
  • The introduction of PAMM signifies a shift towards more sophisticated methods in multi
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Deep Equilibrium models for Poisson Imaging Inverse problems via Mirror Descent
PositiveArtificial Intelligence
Deep Equilibrium Models (DEQs) are implicit neural networks that have recently been applied to image regularization, particularly in Gaussian fidelity contexts. This study extends DEQs to Poisson inverse problems, utilizing the Kullback–Leibler divergence for data fidelity. A novel DEQ formulation based on Mirror Descent is introduced, adapting to the data term's structure. The research establishes sufficient conditions and convergence results using the Kurdyka–Lojasiewicz framework for subanalytic functions.
What happens when nanochat meets DiLoCo?
NeutralArtificial Intelligence
The article discusses the integration of the DiLoCo algorithm with the nanochat project, a compact implementation similar to ChatGPT. This integration aims to enhance training efficiency in distributed environments where communication is constrained. By applying DiLoCo as a lightweight wrapper around nanochat's training loop, the researchers can significantly reduce communication overhead by allowing multiple local training steps before synchronization. This approach is compared to a standard data-parallel setup, highlighting the potential for improved model training in resource-limited scenar…
Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding
PositiveArtificial Intelligence
The article presents a novel training strategy called Curriculum-based Relative Policy Optimization (CuRPO) aimed at improving Visual Grounding tasks. It highlights the limitations of Chain-of-Thought (CoT) prompting, particularly when outputs become lengthy or complex, which can degrade performance. The study reveals that simply increasing dataset size does not guarantee better results due to varying complexities. CuRPO utilizes CoT length and generalized Intersection over Union (gIoU) rewards to structure training data progressively from simpler to more challenging examples, demonstrating ef…
MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions
NeutralArtificial Intelligence
MoHoBench is a newly developed benchmark aimed at assessing the honesty of Multimodal Large Language Models (MLLMs) when confronted with unanswerable visual questions. Despite advancements in vision-language tasks, MLLMs often produce unreliable content. This study systematically evaluates the honesty of 28 popular MLLMs using a dataset of over 12,000 visual questions, revealing that many models struggle to provide honest responses. The findings highlight the need for improved trustworthiness in AI systems.
Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data
PositiveArtificial Intelligence
Supervised Fine-Tuning (SFT) is essential for adapting Large Language Models (LLMs) to specialized fields like medical reasoning. Current SFT methods often utilize unfiltered datasets, which can be redundant and of low quality, leading to high computational costs and poor performance. This study introduces a new data selection strategy called Difficulty-Influence Quadrant (DIQ), which aims to optimize sample selection based on both difficulty and optimization utility, enhancing the efficiency of medical reasoning applications.
MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs
PositiveArtificial Intelligence
MVI-Bench is introduced as a comprehensive benchmark aimed at evaluating the robustness of Large Vision-Language Models (LVLMs) against misleading visual inputs. Traditional benchmarks have primarily focused on textual inputs, neglecting the significant impact of visual misrepresentation. MVI-Bench categorizes misleading visual inputs into three hierarchical levels: Visual Concept, Visual Attribute, and Visual Relationship, and includes 1,248 annotated Visual Question Answering (VQA) instances to facilitate detailed robustness assessments.
2D Gaussians Spatial Transport for Point-supervised Density Regression
PositiveArtificial Intelligence
The paper presents Gaussian Spatial Transport (GST), a new framework that utilizes Gaussian splatting to transfer probability measures from image coordinates to annotation maps. It introduces a method for estimating pixel-annotation correspondence, which is used to create a transport plan based on Bayesian probability. A loss function is derived to integrate this transport plan into standard network optimization for computer vision tasks. Experiments in crowd counting and landmark detection demonstrate the approach's effectiveness, improving efficiency by eliminating iterative transport plan c…
Higher-Order Transformers With Kronecker-Structured Attention
PositiveArtificial Intelligence
The paper introduces the Higher-Order Transformer (HOT), a novel attention framework designed to handle high-dimensional, multiway tensor data. Traditional Transformers struggle with such data due to computational inefficiencies and the need to flatten inputs, which disrupts tensor structures. HOT utilizes Kronecker products to represent multiway attention, efficiently capturing relationships across dimensions while maintaining tensor integrity. Experiments demonstrate HOT's competitive performance on 2D and 3D datasets, retaining the expressiveness of full high-order attention.