arXiv:2512.09095v1 Announce Type: new 
Abstract: Generating realistic food images for categories with multiple nouns is surprisingly challenging. For instance, the prompt "egg noodle" may result in images that incorrectly contain both eggs and noodles as separate entities. Multi-noun food categories are common in real-world datasets and account for a large portion of entries in benchmarks such as UEC-256. These compound names often cause generative models to misinterpret the semantics, producing unintended ingredients or objects. This is due to insufficient multi-noun category related knowledge in the text encoder and misinterpretation of multi-noun relationships, leading to incorrect spatial layouts. To overcome these challenges, we propose FoCULR (Food Category Understanding and Layout Refinement) which incorporates food domain knowledge and introduces core concepts early in the generation process. Experimental results demonstrate that the integration of these techniques improves image generation performance in the food domain.

قدمت دراسة جديدة FoCULR (فهم فئات الطعام وتنقيح التخطيط) بهدف تحسين توليد صور الطعام الواقعية لفئات متعددة الأسماء، والتي غالبًا ما تؤدي إلى تفسيرات خاطئة من قبل النماذج التوليدية. تسلط الدراسة الضوء على التحديات التي تواجهها عندما تؤدي مطالبات مثل 'نودلز البيض' إلى إنتاج صور تمثل بشكل غير صحيح كيانات منفصلة بدلاً من طبق متماسك.

Un nuevo estudio ha presentado FoCULR (Comprensión de Categorías Alimentarias y Refinamiento de Disposición), con el objetivo de mejorar la generación de imágenes de alimentos realistas para categorías de múltiples nombres, que a menudo conducen a interpretaciones erróneas por parte de los modelos generativos. La investigación destaca los desafíos que enfrentan cuando indicaciones como 'fideos de huevo' producen imágenes que representan incorrectamente entidades separadas en lugar de un plato cohesivo.

Une nouvelle étude a introduit FoCULR (Compréhension des Catégories Alimentaires et Affinage de la Disposition), visant à améliorer la génération d'images alimentaires réalistes pour des catégories à plusieurs noms, qui entraînent souvent des interprétations erronées par les modèles génératifs. La recherche met en évidence les défis rencontrés lorsque des invites comme 'nouilles aux œufs' produisent des images qui dépeignent inexactement des entités séparées au lieu d'un plat cohérent.

A new study has introduced FoCULR (Food Category Understanding and Layout Refinement), aimed at improving the generation of realistic food images for multi-noun categories, which often lead to misinterpretations by generative models. The research highlights the challenges faced when prompts like 'egg noodle' result in images that inaccurately depict separate entities instead of a cohesive dish.

Food Image Generation on Multi-Noun Categories

Was this article worth reading? Share it

FirMate

The Influencer AI

Bulk Image Generation AI

Republiclabs.ai

AIPortalX

Imagerr AI

Ready to build your own newsroom?