An Open Model Built a Mario-Style Platformer — Under Contract, on watsonx

Every other game in this arcade — Pong, Tetris, Match-3 — was written by Claude Opus 4.8. This one wasn’t. I wanted to prove the part of the pitch that’s easy to claim and hard to show: that the governance is provider-agnostic. So I swapped the brain entirely and had an open-weight model — openai/gpt-oss-120b, running on IBM watsonx.ai — build the most advanced game of the set: a Mario-style neon platformer, across 6 governed batches.

Same mb contracts. Same gitpilot driver. Same mb check gate. Different model. Here’s the whole thing — and a one-command script to reproduce it.

🎮 Play it first

Play Neon Climber — one self-contained HTML file, mobile + desktop (tilt, touch, or arrows). Bounce up neon platforms, stomp enemies, grab coins + stars, ride springs, use power-ups, climb as high as you can.

Neon Climber — hero

The real game running (headless capture): moving + spring platforms, neon enemies, a star, coins, a high-score (BEST), and a mute toggle.

Neon Climber gameplay

Source, the contract bundle, and the reproducible build script: github.com/ruslanmv/doodle-jump-climber-under-contract.

The setup: GitPilot, pointed at watsonx

GitPilot is provider-agnostic. To use IBM watsonx with the open gpt-oss-120b model, you set a handful of env vars — nothing else in the loop changes:

pip install agent-generator gitcopilot crewai

export GITPILOT_PROVIDER=watsonx
export WATSONX_API_KEY=<your IBM Cloud API key>
export WATSONX_PROJECT_ID=<your watsonx project id>
export WATSONX_URL=https://us-south.ml.cloud.ibm.com
export GITPILOT_WATSONX_MODEL=openai/gpt-oss-120b
export GITPILOT_MAX_TOKENS=18000

The 6 batches

Each batch ran the same four commands — plan a scoped batch, render the contract-bound prompt, let the model extend the single allowed file, validate fail-closed:

mb next "<the batch goal>"
mb prompt --coder gitpilot
gitpilot generate -m "$(cat coder-prompts/gitpilot.md)" -o .
mb check frontend/index.html
# Batch What gpt-oss-120b added Size Matrix Commit
1 Foundation neon hero, jump physics, vertical camera, procedural platforms 15 KB mc-f0101634a883
2 Controls left/right, screen-wrap, tilt + touch + keyboard 22 KB mc-2cde9611b371
3 Platforms + items moving/breaking/spring platforms, coins + stars 36 KB mc-511e55820933
4 Enemies + power-ups stompable neon enemies, shield / jetpack / magnet 35 KB mc-cf84ff722c6e
5 Juice particles, trail, parallax, WebAudio, screen-shake 41 KB mc-b133e1b1d466
6 Meta start / game-over, high score, difficulty ramp, boss milestone 47 KB mc-8c08092f78c3

Every batch returned MATRIX_STATUS: approved score=100, the model wrote to only frontend/index.html, and a headless smoke test found zero runtime errors. Full transcript in EVIDENCE.md.

An honest note on open models. gpt-oss-120b is strong, but below Opus 4.8 for huge single-file rewrites: it wrote more compact code (batch 4 even shrank the file) and I verified after every batch that earlier features survived (they did — no regressions). The discipline that makes that safe is exactly the contract: an allow-list the model can’t exceed and a validator that runs before anything lands.

Reproduce it with one command

The repo ships a build.sh that rebuilds the entire game from scratch — all 6 batches — with watsonx:

git clone https://github.com/ruslanmv/doodle-jump-climber-under-contract
cd doodle-jump-climber-under-contract
pip install agent-generator gitcopilot crewai
export WATSONX_API_KEY=...  WATSONX_PROJECT_ID=...
./build.sh          # → frontend/index.html, validated batch by batch

Want Claude or local Ollama instead? Change GITPILOT_PROVIDER and the model env var at the top of build.sh. The contracts don’t change.

How it works — and why Matrix Builder

How it works: Matrix Builder × GitPilot × watsonx, and the advantages of mb

  • A contract, not a prompt — a locked blueprint + pinned standards.
  • Allow-list scope — the model edits only the files you permit.
  • Fail-closed validationmb check returns approved / needs-repair / rejected.
  • Immutable Matrix Commits — every change pins the prompt, diff, and verdict.
  • Provider-agnostic — Claude, OpenAI, watsonx, or local Ollama; the governance never changes. This game is the proof.

Take it for a spin

An open model built this game, and it could prove it stayed within scope at every step. That ability to verify exactly what an AI was allowed to change is the part worth taking away.

Leave a comment