Shape and Texture Recognition in Large Vision-Language Models

arXiv — cs.CVFriday, November 21, 2025 at 5:00:00 AM
  • The introduction of the Large Shape and Textures dataset (LAS&T) aims to enhance the evaluation of Large Vision
  • This development highlights the ongoing challenges faced by VLMs, which struggle to match human performance in visual recognition tasks, particularly in complex scenarios involving variations in shape and texture.
  • The findings underscore a broader trend in AI research, where advancements in model architectures and datasets are crucial for improving visual understanding, yet significant gaps remain in achieving human
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Continue Readings
Google's Nano Banana Pro model makes AI images sharper, cleaner, and far more real
PositiveArtificial Intelligence
Google has launched its latest image-generation model, Nano Banana Pro, which enhances the quality of AI-generated images. This model, powered by the Gemini 3 Pro, is designed to produce sharper, cleaner, and more realistic visuals. Users can access this advanced tool through the Gemini mobile app, marking a significant step forward in Google's generative AI capabilities.
PowerToys 0.96 upgrades Advanced Paste with local AI support
PositiveArtificial Intelligence
PowerToys 0.96 introduces an upgraded Advanced Paste feature with a redesigned user interface and support for various AI endpoints, including Azure, OpenAI, Gemini, Mistral, and local models like Foundry Local and Ollama. The update also enhances the Command Palette and PowerRename tools.
You Can Now Ask Google Gemini Whether an Image is AI-Generated or Not
PositiveArtificial Intelligence
Google has introduced a new feature in its Gemini platform that allows users to determine whether an image is AI-generated. This tool addresses the growing need for clarity in a landscape increasingly filled with AI-created content.
SURFing to the Fundamental Limit of Jet Tagging
NeutralArtificial Intelligence
The article discusses the SURF method, a new approach to validating generative models in jet tagging. It highlights the importance of understanding the upper performance limits of jet tagging algorithms. By using generative surrogate models, the SURF method enables exact Neyman-Pearson tests, demonstrating that modern jet taggers may be operating near their statistical limits. The study specifically applies the EPiC-FM generative model as a valid surrogate reference for JetClass jets.
Segmenting Collision Sound Sources in Egocentric Videos
PositiveArtificial Intelligence
The proposed task of Collision Sound Source Segmentation (CS3) aims to identify and segment objects responsible for collision sounds in egocentric videos. This task addresses challenges such as cluttered visual scenes and brief interactions, utilizing a weakly-supervised method that leverages audio cues and foundation models like CLIP and SAM2. The focus on egocentric video allows for clearer sound identification despite visual complexity.
Dataset Distillation for Pre-Trained Self-Supervised Vision Models
PositiveArtificial Intelligence
The paper discusses dataset distillation, aiming to create a small set of synthetic images that can train a model to match the performance of one trained on a larger dataset. Unlike previous methods that focus on randomly initialized models, this research targets pre-trained self-supervised vision models. The proposed Linear Gradient Matching method optimizes synthetic images to produce similar gradients in a linear classifier as real data, enhancing the training process.
How Noise Benefits AI-generated Image Detection
PositiveArtificial Intelligence
The rapid advancement of generative models has made it increasingly difficult to distinguish between real and AI-generated images. Researchers have identified that out-of-distribution generalization remains a challenge due to spurious shortcuts used during training. To combat this, they propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which trains a noise generator alongside a detection network to enhance the detection of AI-generated images by mitigating shortcut dominance.
Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution
PositiveArtificial Intelligence
The study presents a novel Mixture-of-Ranks (MoR) architecture for real-world image super-resolution (Real-ISR), integrating sparse Mixture-of-Experts (MoE) into existing frameworks. This approach aims to enhance the adaptability of models in capturing the diverse characteristics of degraded images while facilitating knowledge sharing among inputs. The proposed method utilizes a fine-grained expert partitioning strategy, treating each rank in Low-Rank Adaptation (LoRA) as an independent expert.