Mitigating Bias with Words: Inducing Demographic Ambiguity in Face Recognition Templates by Text Encoding

arXiv — cs.CV•Thursday, December 11, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A novel strategy called Unified Text-Image Embedding (UTIE) has been proposed to mitigate demographic biases in face recognition systems by inducing demographic ambiguity in face embeddings. This approach enriches facial embeddings with information from various demographic groups, promoting fairer verification performance across different demographics.
The development of UTIE is significant as it addresses critical disparities in verification performance that can arise in multicultural urban environments, where biometrics are increasingly integrated into smart city infrastructures.
This advancement reflects a broader trend in artificial intelligence towards enhancing fairness and reducing bias in machine learning models, particularly in vision-language systems. The ongoing exploration of methods to improve demographic representation and safety in AI models underscores the importance of addressing biases that can affect diverse populations.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Humanize AI

Transform AI-generated text into undetectable, human-like content effortlessly.

Business & ProductivityView app details

SwapAnything.io

AI-powered face and outfit swapping for creative design projects.

Creative & DesignView app details

Videolulu

Generate faceless videos automatically for your content needs.

AI & DataView app details

Continue Readings

arXiv — cs.LG18 hours ago

Solving Semi-Supervised Few-Shot Learning from an Auto-Annotation Perspective

PositiveArtificial Intelligence

Recent research has highlighted the challenges in semi-supervised few-shot learning (SSFSL), particularly in the context of auto-annotation. The study reveals that while Vision-Language Models (VLMs) are powerful, they often underperform in SSFSL due to their inability to effectively utilize unlabeled data, leading to weak supervision signals.

Read full article

via arXiv — cs.LG

arXiv — cs.CV2 days ago

Evaluating Small Vision-Language Models on Distance-Dependent Traffic Perception

NeutralArtificial Intelligence

A new benchmark called Distance-Annotated Traffic Perception Question Answering (DTPQA) has been introduced to evaluate Vision-Language Models (VLMs) specifically for distance-dependent traffic perception. This benchmark aims to enhance the reliability of automated driving systems by focusing on perception capabilities at both close and long ranges, addressing the need for robust models in safety-critical applications.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

WeatherDiffusion: Controllable Weather Editing in Intrinsic Space

PositiveArtificial Intelligence

WeatherDiffusion has been introduced as a diffusion-based framework that enables controllable weather editing in intrinsic space, utilizing an inverse renderer to estimate material properties and scene geometry from input images. This framework enhances the editing process by generating images based on specific weather conditions described in text prompts.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model

PositiveArtificial Intelligence

A novel framework for class-incremental learning based on Vision-Language Models (VLMs) has been introduced, which aims to enhance image classification by integrating task-specific adapters and a cross-task representation calibration strategy. This approach addresses the challenge of preserving previously learned knowledge while adapting to new classes, thereby reducing class confusion across tasks.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation

PositiveArtificial Intelligence

The Dynamic Image Prompt Adapter (DynaIP) has been introduced as a novel tool aimed at enhancing Personalized Text-to-Image (PT2I) generation, addressing key challenges such as maintaining concept fidelity and scalability for multi-subject personalization. This advancement allows for zero-shot PT2I without the need for test-time fine-tuning, leveraging multimodal diffusion transformers (MM-DiT) to improve image generation quality.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

VisualActBench: Can VLMs See and Act like a Human?

NeutralArtificial Intelligence

Vision-Language Models (VLMs) have made significant strides in understanding and describing visual environments, yet their capacity to reason and act independently based on visual inputs remains largely unexamined. The introduction of VisualActBench, a benchmark featuring 1,074 videos and 3,733 human-annotated actions, aims to evaluate VLMs' proactive reasoning capabilities. Findings indicate that while advanced models like GPT4o perform well, they still fall short of human-level reasoning, especially in generating proactive actions.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Dynamic Facial Expressions Analysis Based Parkinson's Disease Auxiliary Diagnosis

PositiveArtificial Intelligence

A novel method for auxiliary diagnosis of Parkinson's disease (PD) has been proposed, utilizing dynamic facial expression analysis to identify hypomimia, a key symptom of the disorder. This approach employs a multimodal facial expression analysis network that integrates visual and textual features while maintaining the temporal dynamics of facial expressions, ultimately processed through an LSTM-based classification network.

Read full article

via arXiv — cs.CV

arXiv — cs.CV2 days ago

Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation

PositiveArtificial Intelligence

A new study introduces a defect-aware hybrid prompt optimization method, termed DAPO, aimed at enhancing zero-shot multi-type anomaly detection and segmentation. This approach leverages high-level semantic information from vision-language models like CLIP, addressing the challenge of recognizing fine-grained anomaly types such as 'hole', 'cut', and 'scratch'.

Read full article

via arXiv — cs.CV

Ready to build your own newsroom?

Subscribe once and get a personalised feed, podcast, newsletter, and notifications tuned to the topics you actually care about.