An Open Model Built a Mario-Style Platformer — Under Contract, on watsonx
Every other game in this arcade — Pong, Tetris, Match-3 — was written by Claude Opus 4.8. This one wasn’t. I wanted to prove the part of the pitch that’s easy to claim and hard to show: that the governance is provider-agnostic. So I swapped the brain entirely and had an open-weight model — openai/gpt-oss-120b, running on IBM watsonx.ai — build the most advanced game of the set: a Mario-style neon platformer, across 6 governed batches.
Same mb contracts. Same gitpilot driver. Same mb check gate. Different model. Here’s the whole thing — and a one-command script to reproduce it.
🎮 Play it first
▶ Play Neon Climber — one self-contained HTML file, mobile + desktop (tilt, touch, or arrows). Bounce up neon platforms, stomp enemies, grab coins + stars, ride springs, use power-ups, climb as high as you can.
The real game running (headless capture): moving + spring platforms, neon enemies, a star, coins, a high-score (BEST), and a mute toggle.

Source, the contract bundle, and the reproducible build script: github.com/ruslanmv/doodle-jump-climber-under-contract.
The setup: GitPilot, pointed at watsonx
GitPilot is provider-agnostic. To use IBM watsonx with the open gpt-oss-120b model, you set a handful of env vars — nothing else in the loop changes:
pip install agent-generator gitcopilot crewai
export GITPILOT_PROVIDER=watsonx
export WATSONX_API_KEY=<your IBM Cloud API key>
export WATSONX_PROJECT_ID=<your watsonx project id>
export WATSONX_URL=https://us-south.ml.cloud.ibm.com
export GITPILOT_WATSONX_MODEL=openai/gpt-oss-120b
export GITPILOT_MAX_TOKENS=18000
The 6 batches
Each batch ran the same four commands — plan a scoped batch, render the contract-bound prompt, let the model extend the single allowed file, validate fail-closed:
mb next "<the batch goal>"
mb prompt --coder gitpilot
gitpilot generate -m "$(cat coder-prompts/gitpilot.md)" -o .
mb check frontend/index.html
| # | Batch | What gpt-oss-120b added | Size | Matrix Commit |
|---|---|---|---|---|
| 1 | Foundation | neon hero, jump physics, vertical camera, procedural platforms | 15 KB | mc-f0101634a883 |
| 2 | Controls | left/right, screen-wrap, tilt + touch + keyboard | 22 KB | mc-2cde9611b371 |
| 3 | Platforms + items | moving/breaking/spring platforms, coins + stars | 36 KB | mc-511e55820933 |
| 4 | Enemies + power-ups | stompable neon enemies, shield / jetpack / magnet | 35 KB | mc-cf84ff722c6e |
| 5 | Juice | particles, trail, parallax, WebAudio, screen-shake | 41 KB | mc-b133e1b1d466 |
| 6 | Meta | start / game-over, high score, difficulty ramp, boss milestone | 47 KB | mc-8c08092f78c3 |
Every batch returned MATRIX_STATUS: approved score=100, the model wrote to only frontend/index.html, and a headless smoke test found zero runtime errors. Full transcript in EVIDENCE.md.
An honest note on open models.
gpt-oss-120bis strong, but below Opus 4.8 for huge single-file rewrites: it wrote more compact code (batch 4 even shrank the file) and I verified after every batch that earlier features survived (they did — no regressions). The discipline that makes that safe is exactly the contract: an allow-list the model can’t exceed and a validator that runs before anything lands.
Reproduce it with one command
The repo ships a build.sh that rebuilds the entire game from scratch — all 6 batches — with watsonx:
git clone https://github.com/ruslanmv/doodle-jump-climber-under-contract
cd doodle-jump-climber-under-contract
pip install agent-generator gitcopilot crewai
export WATSONX_API_KEY=... WATSONX_PROJECT_ID=...
./build.sh # → frontend/index.html, validated batch by batch
Want Claude or local Ollama instead? Change GITPILOT_PROVIDER and the model env var at the top of build.sh. The contracts don’t change.
How it works — and why Matrix Builder
- A contract, not a prompt — a locked blueprint + pinned standards.
- Allow-list scope — the model edits only the files you permit.
- Fail-closed validation —
mb checkreturns approved / needs-repair / rejected. - Immutable Matrix Commits — every change pins the prompt, diff, and verdict.
- Provider-agnostic — Claude, OpenAI, watsonx, or local Ollama; the governance never changes. This game is the proof.
Take it for a spin
- Play / fork: github.com/ruslanmv/doodle-jump-climber-under-contract
- Matrix Builder: agent-matrix/matrix-builder
- GitPilot: gitpilot.ruslanmv.com
- The rest of the arcade: Pong · Tetris · Match-3
An open model built this game, and it could prove it stayed within scope at every step. That ability to verify exactly what an AI was allowed to change is the part worth taking away.
Leave a comment