verl-agent: `verl-agent` extends veRL to train LLM agents using reinforcement learning, featuring a novel step-independent multi-turn rollout mechanism. This design ensures high scalability for long-horizon tasks by allowing customizable per-step input structures and memory management.; on-policy: This repository implements MAPPO, a multi-agent variant of PPO, widely used in cooperative multi-agent games and research. It provides robust implementations for various multi-agent environments like StarCraft II, Hanabi, and Google Research Football, along with detailed training scripts and hyperparameter guidance.
Training large language model agents for complex multi-turn, long-horizon tasks.
Research and experimentation in cooperative multi-agent reinforcement learning