FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment

arXiv — cs.LG•Thursday, November 20, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

FairJudge introduces a novel evaluation protocol for text
The development of FairJudge is significant as it addresses the limitations of existing evaluation methods, which often overlook subtle social cues and biases, thereby promoting a more equitable assessment framework.
The emergence of FairJudge highlights ongoing concerns about bias in AI systems, particularly in relation to gender and race, as previous studies have shown that many models still exhibit significant disparities despite advancements in technology.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Recommended Readings

arXiv — cs.CV12 hours ago

Automated Detection of Visual Attribute Reliance with a Self-Reflective Agent

PositiveArtificial Intelligence

The paper presents an automated framework for detecting visual attribute reliance in trained vision models. It introduces a self-reflective agent that generates and tests hypotheses about the visual attributes influencing model predictions. This iterative process allows the agent to refine its hypotheses based on experimental results and assess the accuracy of its findings, ensuring model robustness and preventing overfitting.

Read full article

via arXiv — cs.CV

arXiv — cs.CV12 hours ago

Unbiased Semantic Decoding with Vision Foundation Models for Few-shot Segmentation

PositiveArtificial Intelligence

The paper presents an Unbiased Semantic Decoding (USD) strategy integrated with the Segment Anything Model (SAM) for few-shot segmentation tasks. This approach aims to enhance the model's generalization ability by extracting target information from both support and query sets simultaneously, addressing the limitations of previous methods that relied heavily on explicit prompts. The study highlights the potential of USD in improving segmentation accuracy across unknown classes.

Read full article

via arXiv — cs.CV

arXiv — cs.LG12 hours ago

D4C: Data-free Quantization for Contrastive Language-Image Pre-training Models

PositiveArtificial Intelligence

Data-Free Quantization (DFQ) presents a solution for model compression without needing real data, which is beneficial in privacy-sensitive contexts. While DFQ has been effective for unimodal models, its application to Vision-Language Models like CLIP has not been thoroughly investigated. This study introduces D4C, a DFQ framework specifically designed for CLIP, addressing challenges such as semantic content and intra-image diversity in synthesized samples.

Read full article

via arXiv — cs.LG

arXiv — cs.CV12 hours ago

FarSLIP: Discovering Effective CLIP Adaptation for Fine-Grained Remote Sensing Understanding

PositiveArtificial Intelligence

The article discusses the limitations of the CLIP model in capturing fine-grained details in remote sensing (RS) data. It highlights two main issues: the underutilization of object-level supervision in RS image-text datasets and the performance degradation of region-text alignment methods when applied to RS data. To address these challenges, the authors introduce the MGRS-200k dataset, which provides rich object-level textual supervision for improved RS region-category alignment.

Read full article

via arXiv — cs.CV

arXiv — cs.LG12 hours ago

Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning

PositiveArtificial Intelligence

The paper presents HASTEN (Hierarchical Semantic Tree Anchoring), a novel approach for Class-Incremental Learning (CIL) that integrates hierarchical information to mitigate catastrophic forgetting. It leverages external knowledge graphs to enhance the learning of visual and textual features, addressing the limitations of existing CLIP-based CIL methods that fail to capture inherent hierarchies in visual and linguistic concepts.

Read full article

via arXiv — cs.LG

arXiv — cs.CV12 hours ago

A Hybrid Multimodal Deep Learning Framework for Intelligent Fashion Recommendation

PositiveArtificial Intelligence

The paper presents a hybrid multimodal deep learning framework designed for intelligent fashion recommendation, addressing outfit compatibility prediction and complementary item retrieval. Utilizing the CLIP architecture, the model integrates visual and textual encoders to create joint representations of fashion items. It achieves a high AUC of 0.95 for compatibility prediction on the Polyvore dataset and an accuracy of 69.24% for retrieving compatible items based on a target item description.

Read full article

via arXiv — cs.CV

arXiv — cs.CV12 hours ago

HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation

NegativeArtificial Intelligence

The paper titled 'HV-Attack: Hierarchical Visual Attack for Multimodal Retrieval Augmented Generation' discusses a new method to exploit vulnerabilities in multimodal Retrieval-Augmented Generation (MRAG) systems. It highlights how imperceptible perturbations to image inputs can misalign and disrupt the generation process, posing significant safety concerns for Large Multimodal Models (LMMs). This research addresses the challenge of robustness in MRAG systems against such visual attacks.

Read full article

via arXiv — cs.CV

arXiv — cs.LG12 hours ago

Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models

NegativeArtificial Intelligence

A recent study examines the gender and racial biases present in multilingual vision-language models (VLMs) such as M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and SigLIP-2. Despite the expectation that multilinguality would reduce bias, findings reveal that all models exhibit stronger gender skew compared to English-only baselines. Notably, CAPIVARA-CLIP shows significant biases in low-resource languages, while NLLB-CLIP and SigLIP-2 transfer English stereotypes into gender-neutral languages.

Read full article

via arXiv — cs.LG