Trapstreet.run

docs

Build a solution

End-to-end: write a solver against an existing task, run it locally, push it to GitHub, submit your score.

We're going to write a solver for the sum-two-numbers task from build a task. It hands you two ints, expects you back their sum. About 5 minutes once you have uv + tp.

What you're making

A folder with two files:

  • solve.py — your program. Reads inputs, writes outputs.
  • trap.yaml — points at the task, declares which files come in and go out.

That's it. No framework code. tp run orchestrates each case for you.

Step 1 — write solve.py

# my-solution/solve.py
import json, os
from pathlib import Path

inputs  = json.loads(os.environ["INPUTS"])
outputs = json.loads(os.environ["OUTPUTS"])

nums = json.loads(Path(inputs["nums.json"]).read_text())
Path(outputs["sum.json"]).write_text(json.dumps({"sum": nums["a"] + nums["b"]}))

Two env vars are everything:

Env varWhat it holdsExample
INPUTSJSON dict, filename → absolute path for case inputs{"nums.json": "/tmp/.trap/.../inputs/basic/nums.json"}
OUTPUTSJSON dict, filename → absolute path for each declared file output{"sum.json": "/tmp/.trap/.../basic/sum.json"}

You can also use stdin / stdout — tp captures both automatically. Many LLM solvers print the answer to stdout and let the task's judge parse it.

Step 2 — write trap.yaml

# my-solution/trap.yaml
tasks:
  sum-two-numbers:                 # must match the task id on trapstreet
    cmd: uv run python solve.py
    traptask: ../sum-task          # path to the cloned task folder
    inputs:
      files: [nums.json]
    file_outputs: [sum.json]
    metadata:
      framework: stdlib-python
      model: hand-written

The metadata: block is self-reported and flows through to the leaderboard row. tp run auto-fills repo: from git remote if you've git-init'd the folder. For public tasks, trapstreet rejects your submission unless metadata.repo resolves to a publicly reachable GitHub URL — so push your solver first.

Step 3 — run it locally

cd my-solution
tp run

uv builds a venv from your pyproject.toml (any will do, even an empty one), runs each case, runs the task's judge, prints a summary. All scores should be 1.0 on the basic / negatives / zero cases.

Step 4 — submit

tp auth login           # one-time browser OAuth, see quick start
tp submit sum-two-numbers

The CLI prints a view_url. Click it; your row's on the leaderboard.

What you didn't have to think about

  • HTTP, auth, retries — tp submit handles it.
  • Per-case scoring — the task author already wrote judge.py. You just hand back the right output.
  • Capturing stdout/stderr/latency/exit_code — tp records it all.
  • Submitting from the same machine you ran on — you can copy .trap/<task>/<ts>/report.json anywhere and submit from there.

Gotchas worth remembering

  • INPUTS keys are filenames, not paths. Use INPUTS["nums.json"], not INPUTS["inputs/basic/nums.json"].
  • The task id must match in three places: your trap.yaml's top-level key, the trapstreet task id, and the argument to tp submit.
  • Use uv run python ... in your cmd, not .venv/bin/python. The first lets uv build the venv; the second only works after a manual setup.
  • Public tasks require a public repo. Push your code to GitHub before tp submit or it's rejected — tp run auto-detects the remote URL into metadata.repo, or set it explicitly in trap.yaml.