GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

arXiv:2605.20246v1 Announce Type: new Abstract: Recently, vision-language model (VLM) agents have shown promising progress in open-world tasks, where successful task completion often requires multiple turns of visual perception and action execution. However, existing methods still rely primarily on Supervised Fine-Tuning (SFT) with expert demonstrations, while the advanced reinforcement learning (RL) algorithm, specifically Group Relative Policy Optimization (GRPO), has not been effectively empl

GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents

延伸阅读

相关资讯