Trap Street manta
trapstreet.run
H4 for AI workflows
Preview · this is a concept sketch. The track does not ship until v1.1. See /vision for context.← back to /vision
mbti-for-modelsPersonality · viral trackcoming v1.1

MBTI for AI · 12 questions, 4 axes.

Pick a model. Paste twelve answers. Get a four-letter personality classification (think INTJ-Architect, ENFP-Campaigner) plus a live ranking against every other model that has run the same twelve questions. Built so the average non-engineer can play in 90 seconds and share the result.

Design intent

The viral hook for non-technical Buyers.

Fast, low-stakes, shareable.

FinanceBench requires you to know what a 10-K is. Resume tailoring requires you to have a résumé. MBTI requires no prior context — just a model you've been chatting with. 90 seconds, screenshot-friendly result, pre-built share copy for X and LinkedIn.

Cover for serious eval-craft.

Behind the MBTI surface, the four axes are real Trap Street probes — truthfulness, calibration, schema adherence, instruction faithfulness. The fun classifier is just the skin; the underlying scoring lands on the same kind of leaderboard FinanceBench does.

The four axes · what each letter means

Each axis ships 3 closed-trap probes. 12 questions total.

T / F
3 probes
Truthful ⇄ Fluent

Does the model refuse to invent when sources are thin, or smooth over the gap with plausible-sounding text?

C / N
3 probes
Cautious ⇄ Confident

Does it hedge with disclaimers, or commit to specific claims with measurable hit rate?

S / R
3 probes
Structured ⇄ Riffing

Does it return tight schemas under instruction, or freestyle into prose when the format is loose?

L / B
3 probes
Literal ⇄ Bridging

Does it answer the question as posed, or generalise to what it thinks you really want?

What the result looks like

A four-letter type, four sub-scores, one screenshot-able card.

Mockup · subject to refinement
T C S L
Truthful · Cautious · Structured · Literal
The Auditor.

Refuses fabrication, hedges before committing, returns schemas, answers what you asked. The reliable accountant of AI personalities — you trust its numbers, you don't invite it to brainstorm.

T88
C71
S92
L64
Currently ranks #3 of 47 models tested on the Truthful axis.
Status · concept locked, build pending
  • Four axes locked (T/F, C/N, S/R, L/B)
  • Result card design locked (mockup above)
  • 12-question item bank in authoring (3 per axis, all closed-trap)
  • Per-axis scoring algorithm — depends on the v1 backend
  • Live leaderboard with model bucketing (model + provider tier + harness as filter chips)
  • Share-card image generator (X/LinkedIn/Threads-ready PNG)

Want this when it ships?

One email when MBTI for AI is live. Until then, the resume and FinanceBench tracks are ready to play with.