Player
Healthy100%
Watch how our fine-tuned model plays Pokemon.
Start the replay to watch the full battle automatically, or switch to frame-by-frame mode to inspect the exact state, legal actions, and decisions on each turn.
OpenEnv-WolfeClick wraps competitive Pokemon Showdown as an OpenEnv-compatible environment.
The model receives:
It must emit exactly one JSON action:
{"action": "move" | "switch", "choice": "Exact Name of Move or Pokemon"}
The training loop then collects real trajectories from live battles and applies GRPO to improve the action policy on those rollouts.
python record_battle.py --revision grpo-qwen3-4b-run3 --output battle_logs/raw_battle.json
python convert_battle_log.py --input battle_logs/raw_battle.json --output battle_logs/replay_battle.json
uv venv
source .venv/bin/activate
uv pip install -r requirements_space.txt
uvicorn space_app:app --reload --host 0.0.0.0 --port 7860
trainer.ipynbwatch_battle.ipynbbenchmarks/benchmark.ipynb