Visual Bridge: Universal Visual Perception Representations Generating

arXiv — cs.CVWednesday, November 12, 2025 at 5:00:00 AM
Recent advancements in diffusion models have led to significant successes in isolated computer vision tasks, such as text-to-image generation and depth estimation. However, these models are constrained by a 'single-task-single-model' approach, limiting their effectiveness in multi-task scenarios. To overcome this, a new universal visual perception framework based on flow matching has been proposed. This framework formulates the generation of visual representations as a universal flow-matching problem, allowing for efficient transfer across heterogeneous tasks. Extensive experiments have demonstrated that this model achieves competitive performance in classification, detection, segmentation, depth estimation, and image-text retrieval, outperforming prior models in both zero-shot and fine-tuned settings. The introduction of a multi-scale, circular task embedding mechanism further enhances its robustness and scalability, marking a significant step forward in AI's ability to generalize acr…
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
SCALEX: Scalable Concept and Latent Exploration for Diffusion Models
PositiveArtificial Intelligence
SCALEX is a newly introduced framework designed for scalable and automated exploration of latent spaces in diffusion models. It addresses the issue of social biases, such as gender and racial stereotypes, that are often encoded in image generation models. By utilizing natural language prompts, SCALEX enables zero-shot interpretation, allowing for systematic comparisons across various concepts and facilitating the discovery of internal model associations without the need for retraining or labeling.
A Survey of Cross-domain Graph Learning: Progress and Future Directions
NeutralArtificial Intelligence
Graph learning is essential for analyzing complex relationships in graph data, with applications in social, citation, and e-commerce networks. Despite the success of foundation models in computer vision (CV) and natural language processing (NLP), existing graph learning methods often lack generalization across domains. Cross-domain graph learning (CDGL) has emerged as a promising approach, aiming to create true graph foundation models. This survey reviews current CDGL research and proposes a taxonomy based on transferable knowledge types: structure-oriented, feature-oriented, and mixture-orien…
Optimizing Federated Learning by Entropy-Based Client Selection
PositiveArtificial Intelligence
The article discusses a novel approach to optimizing federated learning through a method called FedEntOpt. This technique addresses privacy concerns associated with centralized datasets by allowing multiple clients to collaboratively train a global deep learning model without exposing their data. FedEntOpt enhances model performance by selecting clients based on the entropy of the aggregated label distribution, effectively mitigating issues related to label skew. Experiments demonstrate that this method improves classification accuracy by up to 6% compared to existing algorithms.
X-VMamba: Explainable Vision Mamba
PositiveArtificial Intelligence
The X-VMamba model introduces a controllability-based interpretability framework for State Space Models (SSMs), particularly the Mamba architecture. This framework aims to clarify how Vision SSMs process spatial information, which has been a challenge due to the absence of transparent mechanisms. The proposed methods include a Jacobian-based approach for any SSM architecture and a Gramian-based method for diagonal SSMs, both designed to enhance understanding of internal state dynamics while maintaining computational efficiency.
Optimizing Input of Denoising Score Matching is Biased Towards Higher Score Norm
NeutralArtificial Intelligence
The paper titled 'Optimizing Input of Denoising Score Matching is Biased Towards Higher Score Norm' discusses the implications of using denoising score matching in optimizing diffusion models. It reveals that this optimization disrupts the equivalence between denoising score matching and exact score matching, resulting in a bias that favors higher score norms. The study also highlights similar biases in optimizing data distributions with pre-trained diffusion models, affecting various applications such as MAR, PerCo, and DreamFusion.