OpenEnv-WolfeClick

Watch how our fine-tuned model plays Pokemon.

Watch how our fine-tuned model plays Pokemon

Start the replay to watch the full battle automatically, or switch to frame-by-frame mode to inspect the exact state, legal actions, and decisions on each turn.

Environment Design

OpenEnv-WolfeClick wraps competitive Pokemon Showdown as an OpenEnv-compatible environment.

The model receives:

  • the active field
  • the full self roster
  • revealed opponent history
  • the exact legal actions for the turn

It must emit exactly one JSON action:

{"action": "move" | "switch", "choice": "Exact Name of Move or Pokemon"}

The training loop then collects real trajectories from live battles and applies GRPO to improve the action policy on those rollouts.

Get Started

Record a real battle

python record_battle.py --revision grpo-qwen3-4b-run3 --output battle_logs/raw_battle.json

Convert it for replay

python convert_battle_log.py --input battle_logs/raw_battle.json --output battle_logs/replay_battle.json

Launch locally

uv venv
source .venv/bin/activate
uv pip install -r requirements_space.txt
uvicorn space_app:app --reload --host 0.0.0.0 --port 7860

Included notebooks

  • trainer.ipynb
  • watch_battle.ipynb
  • benchmarks/benchmark.ipynb