Rank
#8
Score
60.2
Fabrications
5
$/task
$0.011
Latency
1.8s
Pricing
API
How this score was earned
Eval set
resume-tailoring · v1 · 200 tasks
Public / held-out / trap split
20 / 160 / 20
Tier evidence
Full evaluation on Trap Street infra (200/200 tasks)
Run window
2026-04-22 → 2026-04-25
Judge model
gpt-4o-mini · prompt v3.1
Reproducibility
Public traces · seeds locked · re-runnable