History-Aware Reasoning for GUI Agents

arXiv — cs.CVThursday, November 13, 2025 at 5:00:00 AM
The emergence of Multimodal Large Language Models has significantly advanced GUI automation, yet existing agents face challenges due to weak short-term memory in their reasoning processes. This limitation affects their ability to connect historical interactions, which is vital for executing long-horizon tasks effectively. In response, researchers have proposed a History-Aware Reasoning (HAR) framework designed to improve episodic reasoning capabilities in GUI agents. By encouraging agents to reflect on their errors and learn from them, the HAR framework aims to enhance decision-making processes during task execution. The development of the HAR-GUI-3B model, utilizing this framework, represents a significant step forward in addressing the shortcomings of current GUI agents, ultimately facilitating a more seamless interaction between users and technology.
— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended Readings
Unifying Segment Anything in Microscopy with Vision-Language Knowledge
PositiveArtificial Intelligence
The paper titled 'Unifying Segment Anything in Microscopy with Vision-Language Knowledge' discusses the importance of accurate segmentation in biomedical images. It highlights the limitations of existing models in handling unseen domain data due to a lack of vision-language knowledge. The authors propose a new framework, uLLSAM, which utilizes Multimodal Large Language Models (MLLMs) to enhance segmentation performance. This approach aims to improve generalization capabilities across cross-domain datasets, achieving notable performance improvements.