Visual Bridge: Universal Visual Perception Representations Generating
PositiveArtificial Intelligence
Recent advancements in diffusion models have led to significant successes in isolated computer vision tasks, such as text-to-image generation and depth estimation. However, these models are constrained by a 'single-task-single-model' approach, limiting their effectiveness in multi-task scenarios. To overcome this, a new universal visual perception framework based on flow matching has been proposed. This framework formulates the generation of visual representations as a universal flow-matching problem, allowing for efficient transfer across heterogeneous tasks. Extensive experiments have demonstrated that this model achieves competitive performance in classification, detection, segmentation, depth estimation, and image-text retrieval, outperforming prior models in both zero-shot and fine-tuned settings. The introduction of a multi-scale, circular task embedding mechanism further enhances its robustness and scalability, marking a significant step forward in AI's ability to generalize acr…
— via World Pulse Now AI Editorial System
