May 1, 06:36 PM

Northwestern, Stanford Researchers Develop StarPO and StarPO-S in RAGEN to Stabilize Multi-Turn LLM Agent Training

Researchers from Northwestern University, Stanford University, and other institutions have identified key challenges in training large language model (LLM) agents for multi-turn interactive tasks using reinforcement learning (RL). Their recent study, conducted within the RAGEN system, highlights that multi-turn training often causes instability and performance collapse, with agents falling into repetitive behavior or hallucinating reasoning. To address these issues, the team developed the StarPO framework, which optimizes entire interaction trajectories through improved reward shaping and trajectory control. A variant, StarPO-S, further enhances training stability by refining the optimization process. The research suggests that RL-finetuned reasoning language models can serve as better alternatives to regression-based critics during parallel trajectory search at test time, improving the robustness and reliability of LLM agents in complex, multi-step scenarios.

#Northwestern University #Stanford University #StarPO

Written with ChatGPT (GPT-4).