Preview · this is a concept sketch. The track does not ship until v1.1. See /vision for context.← back to /vision

mbti-for-modelsPersonality · viral trackcoming v1.1

MBTI for AI · 12 questions, 4 axes.

Pick a model. Paste twelve answers. Get a four-letter personality classification (think INTJ-Architect, ENFP-Campaigner) plus a live ranking against every other model that has run the same twelve questions. Built so the average non-engineer can play in 90 seconds and share the result.

Notify me at launch Back to all tracks

Design intent

The viral hook for non-technical Buyers.

Fast, low-stakes, shareable.

FinanceBench requires you to know what a 10-K is. Resume tailoring requires you to have a résumé. MBTI requires no prior context — just a model you've been chatting with. 90 seconds, screenshot-friendly result, pre-built share copy for X and LinkedIn.

Cover for serious eval-craft.

Behind the MBTI surface, the four axes are real Trap Street probes — truthfulness, calibration, schema adherence, instruction faithfulness. The fun classifier is just the skin; the underlying scoring lands on the same kind of leaderboard FinanceBench does.

The four axes · what each letter means

Each axis ships 3 closed-trap probes. 12 questions total.

T / F

3 probes

Truthful ⇄ Fluent

Does the model refuse to invent when sources are thin, or smooth over the gap with plausible-sounding text?

C / N

3 probes

Cautious ⇄ Confident

Does it hedge with disclaimers, or commit to specific claims with measurable hit rate?

S / R

3 probes

Structured ⇄ Riffing

Does it return tight schemas under instruction, or freestyle into prose when the format is loose?

L / B

3 probes

Literal ⇄ Bridging

Does it answer the question as posed, or generalise to what it thinks you really want?

What the result looks like

A four-letter type, four sub-scores, one screenshot-able card.

Mockup · subject to refinement

T C S L

Truthful · Cautious · Structured · Literal

The Auditor.

Refuses fabrication, hedges before committing, returns schemas, answers what you asked. The reliable accountant of AI personalities — you trust its numbers, you don't invite it to brainstorm.

T88

C71

S92

L64

Currently ranks #3 of 47 models tested on the Truthful axis.

Status · concept locked, build pending

✓ Four axes locked (T/F, C/N, S/R, L/B)
✓ Result card design locked (mockup above)
… 12-question item bank in authoring (3 per axis, all closed-trap)
○ Per-axis scoring algorithm — depends on the v1 backend
○ Live leaderboard with model bucketing (model + provider tier + harness as filter chips)
○ Share-card image generator (X/LinkedIn/Threads-ready PNG)

Want this when it ships?

One email when MBTI for AI is live. Until then, the resume and FinanceBench tracks are ready to play with.

Notify me See live tracks