on-policy: This repository implements MAPPO, a multi-agent variant of PPO, widely used in cooperative multi-agent games and research. It provides robust implementations for various multi-agent environments like StarCraft II, Hanabi, and Google Research Football, along with detailed training scripts and hyperparameter guidance.; claude-code-source-all-in-one: This repository extracts the source code of Anthropic's Claude Code CLI for educational study. It includes 18 deep-dive articles analyzing the architecture, covering core agent loop, tool orchestration, context compression, and more. The source can be run locally for learning purposes.
Research and experimentation in cooperative multi-agent reinforcement learning
Studying production-level AI agent architecture and design decisions