A Highly Efficient Diversity-based Input Selection for DNN Improvement Using VLMs

arXiv — cs.CVWednesday, January 14, 2026 at 5:00:00 AM
  • A recent study has introduced Concept-Based Diversity (CBD), a highly efficient metric for image inputs that utilizes Vision-Language Models (VLMs) to enhance the performance of Deep Neural Networks (DNNs) through improved input selection. This approach addresses the computational intensity and scalability issues associated with traditional diversity-based selection methods.
  • The development of CBD is significant as it allows for the identification of informative subsets of data for labeling, which can reduce the time and cost associated with fine-tuning DNNs, thereby improving their practical applicability in various domains.
  • This advancement aligns with ongoing efforts in the AI field to optimize deep learning infrastructures and enhance the efficiency of VLMs, reflecting a broader trend towards integrating diverse methodologies to tackle the challenges posed by increasing computational demands in real-world applications.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps
Continue Readings
Cross-Cultural Expert-Level Art Critique Evaluation with Vision-Language Models
NeutralArtificial Intelligence
A new evaluation framework for assessing the cultural interpretation capabilities of Vision-Language Models (VLMs) has been introduced, focusing on cross-cultural art critique. This tri-tier framework includes automated metrics, rubric-based scoring, and calibration against human ratings, revealing a 5.2% reduction in mean absolute error in cultural understanding assessments.
NOVAK: Unified adaptive optimizer for deep neural networks
PositiveArtificial Intelligence
The recent introduction of NOVAK, a unified adaptive optimizer for deep neural networks, combines several advanced techniques including adaptive moment estimation and lookahead synchronization, aiming to enhance the performance and efficiency of neural network training.
When Models Know When They Do Not Know: Calibration, Cascading, and Cleaning
PositiveArtificial Intelligence
A recent study titled 'When Models Know When They Do Not Know: Calibration, Cascading, and Cleaning' proposes a universal training-free method for model calibration, cascading, and data cleaning, enhancing models' ability to recognize their limitations. The research highlights that higher confidence correlates with higher accuracy and that models calibrated on validation sets maintain their calibration on test sets.
Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission
PositiveArtificial Intelligence
A novel framework named ENACHI has been proposed for hierarchical online scheduling in energy-efficient split inference with Deep Neural Networks (DNNs), addressing the inefficiencies in current scheduling methods that fail to optimize both task-level decisions and packet-level dynamics. This framework integrates a two-tier Lyapunov-based approach and progressive transmission techniques to enhance adaptivity and resource utilization.
IGAN: A New Inception-based Model for Stable and High-Fidelity Image Synthesis Using Generative Adversarial Networks
PositiveArtificial Intelligence
A new model called Inception Generative Adversarial Network (IGAN) has been introduced, addressing the challenges of high-quality image synthesis and training stability in Generative Adversarial Networks (GANs). The IGAN model utilizes deeper inception-inspired and dilated convolutions, achieving significant improvements in image fidelity with a Frechet Inception Distance (FID) of 13.12 and 15.08 on the CUB-200 and ImageNet datasets, respectively.
Semantic Misalignment in Vision-Language Models under Perceptual Degradation
NeutralArtificial Intelligence
Recent research has highlighted significant semantic misalignment in Vision-Language Models (VLMs) when subjected to perceptual degradation, particularly through controlled visual perception challenges using the Cityscapes dataset. This study reveals that while traditional segmentation metrics show only moderate declines, VLMs exhibit severe failures in downstream tasks, including hallucinations and inconsistent safety judgments.
CoMa: Contextual Massing Generation with Vision-Language Models
PositiveArtificial Intelligence
The CoMa project has introduced an innovative automated framework for generating building massing, addressing the complexities of architectural design by utilizing functional requirements and site context. This framework is supported by the newly developed CoMa-20K dataset, which includes detailed geometries and contextual data.
VULCA-Bench: A Multicultural Vision-Language Benchmark for Evaluating Cultural Understanding
NeutralArtificial Intelligence
VULCA-Bench has been introduced as a multicultural benchmark aimed at evaluating the cultural understanding of Vision-Language Models (VLMs) through a comprehensive framework that spans various cultural traditions. This benchmark includes 7,410 matched image-critique pairs and emphasizes higher-order cultural interpretation rather than just basic visual perception.

Ready to build your own newsroom?

Subscribe to unlock a personalised feed, podcasts, newsletters, and notifications tailored to the topics you actually care about