Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
NeutralArtificial Intelligence
The article discusses the challenges of data scarcity in Vision-Language Navigation (VLN) and how traditional methods rely on simulator data or web-collected images to enhance generalization. It highlights the limitations of these approaches, including the lack of diversity in simulator environments and the labor-intensive process of cleaning web data.
— Curated by the World Pulse Now AI Editorial System




