Shape and Texture Recognition in Large Vision-Language Models

arXiv — cs.CVWednesday, December 10, 2025 at 5:00:00 AM
  • The Large Shapes and Textures dataset (LAS&T) has been introduced to enhance the capabilities of Large Vision-Language Models (LVLMs) in recognizing and representing shapes and textures across various contexts. This dataset, created through unsupervised extraction from natural images, serves as a benchmark for evaluating the performance of leading models like CLIP and DINO in shape recognition tasks.
  • This development is significant as it highlights the current limitations of LVLMs, which still fall short of human performance in shape recognition, particularly under varying orientations and contexts. The introduction of LAS&T aims to bridge this gap and improve the models' visual understanding.
  • The ongoing advancements in vision-language models reflect a broader trend in AI research, focusing on enhancing model robustness and versatility. As researchers explore various frameworks and techniques, such as Graph-Regularized Sparse Autoencoders and multi-modal embeddings, the quest for improved visual recognition capabilities continues to evolve, addressing challenges like class imbalance and scene understanding.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Google’s Trends Explore page gets new Gemini capabilities
PositiveArtificial Intelligence
Google has upgraded its Trends Explore page, integrating Gemini capabilities to enhance the analysis of search interest and allow users to identify and compare relevant trends more effectively. This significant update aims to improve user engagement and data insights.
Google taps its massive data advantage with new Gemini feature
PositiveArtificial Intelligence
Google has introduced a new feature called 'Personal Intelligence' for its Gemini AI, which integrates data from Gmail, Google Photos, and YouTube to enhance user interactions. This feature aims to make the AI assistant more responsive and personalized by leveraging Google's extensive data resources.
Gemini can now scan your photos, email, and more to provide better answers
NeutralArtificial Intelligence
Google has introduced a new feature for its AI model, Gemini, allowing it to scan users' photos, emails, and other data to provide more accurate responses. This feature is currently available only to paid users and is disabled by default.
Gemini can now pull context the rest of your Google apps, if you let it
NeutralArtificial Intelligence
Google has announced that its AI model, Gemini, can now pull context from other Google applications, enhancing its functionality and user experience. This capability allows Gemini to provide more personalized and relevant responses by integrating data from services like Gmail and Calendar, contingent on user consent.
Google Gemini Can Proactively Analyze Users’ Gmail, Photos, Searches
PositiveArtificial Intelligence
Alphabet Inc.'s Google has announced that its Gemini artificial intelligence assistant can now proactively analyze users' data across various platforms, including Gmail, Search, Photos, and YouTube, enhancing personalization for its consumer-facing AI product.
Gemini's new Personal Intelligence will look through your emails and photos - if you let it
NeutralArtificial Intelligence
Google has introduced a new feature for its AI model, Gemini, called 'Personal Intelligence,' which allows it to scan users' emails, photos, and other data to provide more personalized responses, contingent on user consent. This feature aims to enhance user interaction by leveraging data from various Google services, including Gmail and YouTube.
Gemini’s new beta feature provides proactive responses based on your photos, emails, and more
NeutralArtificial Intelligence
Google has launched a new beta feature for its AI model, Gemini, called 'Personal Intelligence,' which allows the AI to proactively respond to users by analyzing their emails, photos, and other data, contingent on user consent. This feature is currently off by default, giving users control over their data integration with Gemini.
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
PositiveArtificial Intelligence
Franca, the first fully open-source vision foundation model, has been introduced, showcasing performance that matches or exceeds proprietary models like DINOv2 and CLIP. This model utilizes a transparent training pipeline and publicly available datasets, addressing limitations in current self-supervised learning clustering methods through a novel nested Matryoshka clustering approach.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about