CountSteer: Steering Attention for Object Counting in Diffusion Models

arXiv — cs.CVMonday, November 17, 2025 at 5:00:00 AM
The article discusses CountSteer, a new method designed to enhance the performance of text-to-image diffusion models in accurately generating specified object counts. While these models typically struggle with numerical instructions, research indicates they possess an implicit awareness of their counting accuracy. CountSteer leverages this insight by adjusting the model's cross-attention hidden states during inference, resulting in a 4% improvement in object-count accuracy without sacrificing visual quality.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models
NeutralArtificial Intelligence
The article examines the balance between generalization and memorization in text-to-image diffusion models, focusing on 'multimodal iconicity.' This concept refers to how images and texts evoke shared cultural associations. The authors introduce an evaluation framework that distinguishes between recognition of cultural references and their realization in images. They evaluate five diffusion models against 767 cultural references from Wikidata, demonstrating their framework's ability to differentiate between replication and transformation.