Shape and Texture Recognition in Large Vision-Language Models

arXiv — cs.CV•Friday, November 21, 2025 at 5:00:00 AM

NeutralArtificial Intelligence

The introduction of the Large Shape and Textures dataset (LAS&T) aims to enhance the evaluation of Large Vision
This development highlights the ongoing challenges faced by VLMs, which struggle to match human performance in visual recognition tasks, particularly in complex scenarios involving variations in shape and texture.
The findings underscore a broader trend in AI research, where advancements in model architectures and datasets are crucial for improving visual understanding, yet significant gaps remain in achieving human

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

TechSpota day ago

Google's Nano Banana Pro model makes AI images sharper, cleaner, and far more real

PositiveArtificial Intelligence

Google has launched its latest image-generation model, Nano Banana Pro, which enhances the quality of AI-generated images. This model, powered by the Gemini 3 Pro, is designed to produce sharper, cleaner, and more realistic visuals. Users can access this advanced tool through the Gemini mobile app, marking a significant step forward in Google's generative AI capabilities.

Read full article

via TechSpot

TechSpota day ago

PowerToys 0.96 upgrades Advanced Paste with local AI support

PositiveArtificial Intelligence

PowerToys 0.96 introduces an upgraded Advanced Paste feature with a redesigned user interface and support for various AI endpoints, including Azure, OpenAI, Gemini, Mistral, and local models like Foundry Local and Ollama. The update also enhances the Command Palette and PowerRename tools.

Read full article

via TechSpot

PetaPixela day ago

You Can Now Ask Google Gemini Whether an Image is AI-Generated or Not

PositiveArtificial Intelligence

Google has introduced a new feature in its Gemini platform that allows users to determine whether an image is AI-generated. This tool addresses the growing need for clarity in a landscape increasingly filled with AI-created content.

Read full article

via PetaPixel

arXiv — cs.LG2 days ago

SURFing to the Fundamental Limit of Jet Tagging

NeutralArtificial Intelligence

The article discusses the SURF method, a new approach to validating generative models in jet tagging. It highlights the importance of understanding the upper performance limits of jet tagging algorithms. By using generative surrogate models, the SURF method enables exact Neyman-Pearson tests, demonstrating that modern jet taggers may be operating near their statistical limits. The study specifically applies the EPiC-FM generative model as a valid surrogate reference for JetClass jets.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Segmenting Collision Sound Sources in Egocentric Videos

PositiveArtificial Intelligence

The proposed task of Collision Sound Source Segmentation (CS3) aims to identify and segment objects responsible for collision sounds in egocentric videos. This task addresses challenges such as cluttered visual scenes and brief interactions, utilizing a weakly-supervised method that leverages audio cues and foundation models like CLIP and SAM2. The focus on egocentric video allows for clearer sound identification despite visual complexity.

Read full article

via arXiv — cs.CV

arXiv — cs.LG2 days ago

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

PositiveArtificial Intelligence

The paper discusses dataset distillation, aiming to create a small set of synthetic images that can train a model to match the performance of one trained on a larger dataset. Unlike previous methods that focus on randomly initialized models, this research targets pre-trained self-supervised vision models. The proposed Linear Gradient Matching method optimizes synthetic images to produce similar gradients in a linear classifier as real data, enhancing the training process.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

How Noise Benefits AI-generated Image Detection

PositiveArtificial Intelligence

The rapid advancement of generative models has made it increasingly difficult to distinguish between real and AI-generated images. Researchers have identified that out-of-distribution generalization remains a challenge due to spurious shortcuts used during training. To combat this, they propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which trains a noise generator alongside a detection network to enhance the detection of AI-generated images by mitigating shortcut dominance.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution

PositiveArtificial Intelligence

The study presents a novel Mixture-of-Ranks (MoR) architecture for real-world image super-resolution (Real-ISR), integrating sparse Mixture-of-Experts (MoE) into existing frameworks. This approach aims to enhance the adaptability of models in capturing the diverse characteristics of degraded images while facilitating knowledge sharing among inputs. The proposed method utilizes a fine-grained expert partitioning strategy, treating each rank in Low-Rank Adaptation (LoRA) as an independent expert.

Read full article

via arXiv — cs.CV