run · run-mpj0adlm-n
docling-claude
status
scored · failed
total score
0.632
cases
12 passed · 7 failed
latency
147.05s
cost
$1.695
tokens
—
scored
2026-05-23 23:55:10
duration
147s
Run summary
- score
- 0.632
- passed
- ✗
- total cases
- 19
- passed cases
- 12
- skipped cases
- 0
- pass threshold
- 0.800
- tokens total
- null
- cost total
- $1.695
- latency p95
- 13.20 s
- latency total
- 147.05 s
- latency median
- 7.12 s
score by category
dates
0%
money
29%
clauses
100%
deposit
100%
scenario
0%
scenario_reasoning
100%
extras
n_scored=19n_skipped_no_gold=0Solution metadata
Self-reported by the solution. Not validated.
- repo
- https://github.com/Ruqii/trapstreet-solutions
- engine
- docling + claude-opus-4-7
Per-case results
| case | exit | duration | metrics |
|---|---|---|---|
| break_clause | 0 | 6892 ms | id=break_clausetype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085681difficulty=mediumagent_answer=Noinput_tokens=66output_tokens=5expected_answer=nomatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| deposit_amount | 0 | 7074 ms | id=deposit_amounttype=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=0.086342difficulty=easyagent_answer=The deposit amount is not legible in the extracted text.input_tokens=30output_tokens=21expected_answer=2250matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| deposit_dispute_escalation | 0 | 7590 ms | id=deposit_dispute_escalationtype=textmodel=claude-opus-4-7score=1category=depositusd_cost=0.089251difficulty=hardagent_answer=The dispute will be submitted to the ICE (Independent Case Examiner) for adjudication, and all parties agree to cooperate with the adjudication.input_tokens=64output_tokens=53expected_answer=The dispute may be submitted to the Independent Case Examiner (ICE) for adjudication.matcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| deposit_scheme | 0 | 6819 ms | id=deposit_schemetype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.087391difficulty=mediumagent_answer=The Tenancy Deposit Scheme, administered by The Dispute Service Limited.input_tokens=40output_tokens=33expected_answer=TDSmatcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| early_surrender | 0 | 6296 ms | id=early_surrendertype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085696difficulty=hardagent_answer=Yesinput_tokens=57output_tokens=7expected_answer=yesmatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| early_surrender_economic_incentive | 0 | 9685 ms | id=early_surrender_economic_incentivetype=booleanmodel=claude-opus-4-7score=1category=scenario_reasoningusd_cost=0.094021difficulty=expertagent_answer=No. The clause only provides for the tenant to pay "any difference in rental payments that the outgoing Tenant should have paid to the extent that the replacement Tenant will be paying a lower amount" — it only addresses shortfalls, not surpluses, so the tenant gets no benefit if the new rent is higher.input_tokens=127output_tokens=104expected_answer=nomatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| fixed_term_departure_notice | 0 | 6907 ms | id=fixed_term_departure_noticetype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085637difficulty=hardagent_answer=Yesinput_tokens=53output_tokens=7expected_answer=yesmatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| governing_act | 0 | 7438 ms | id=governing_acttype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.086416difficulty=hardagent_answer=Housing Act 1988input_tokens=80output_tokens=12expected_answer=Housing Act 1988matcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| inventory_referenced | 0 | 7030 ms | id=inventory_referencedtype=booleanmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085741difficulty=hardagent_answer=yesinput_tokens=65output_tokens=6expected_answer=yesmatcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| late_rent_interest_rate | 0 | 6837 ms | id=late_rent_interest_ratetype=textmodel=claude-opus-4-7score=1category=moneyusd_cost=0.086897difficulty=hardagent_answer=3% per annum above the Bank of England base rateinput_tokens=72output_tokens=20expected_answer=3% per annum above Bank of England base ratematcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| pets_allowed | 0 | 7122 ms | id=pets_allowedtype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.085772difficulty=mediumagent_answer=with landlord consentinput_tokens=47output_tokens=10expected_answer=with landlord consentmatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| post_fixed_term_extension | 0 | 8346 ms | id=post_fixed_term_extensiontype=textmodel=claude-opus-4-7score=1category=clausesusd_cost=0.089852difficulty=hardagent_answer=The agreement automatically extends for a further six months from the expiration of the term, with the rent increased by 5% (during which the tenant may end the tenancy by giving not less than two months' written notice).input_tokens=49output_tokens=64expected_answer=the tenancy automatically extends for six monthsmatcher_results=[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_increase_scope | 0 | 7764 ms | id=rent_increase_scopetype=textmodel=claude-opus-4-7score=1category=moneyusd_cost=0.086762difficulty=hardagent_answer=The automatic extension period (the further six months after the fixed term).input_tokens=48output_tokens=23expected_answer=the automatic extension period onlymatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_payment_day | 0 | 6613 ms | id=rent_payment_daytype=textmodel=claude-opus-4-7score=0category=moneyusd_cost=0.085667difficulty=mediumagent_answer=The 1st of the month.input_tokens=30output_tokens=12expected_answer=on or prior 5th of the monthmatcher_results=[object Object],[object Object],[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_year2 | 0 | 9655 ms | id=rent_year2type=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=0.088742difficulty=mediumagent_answer=The document states rent increases but the actual figures appear as garbled/encoded text in the extraction and cannot be read as a clear GBP amount.input_tokens=50output_tokens=49expected_answer=2100matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| rent_year3 | 0 | 7191 ms | id=rent_year3type=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=0.085967difficulty=mediumagent_answer=N/Ainput_tokens=75output_tokens=7expected_answer=2400matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| scenario_leave_22mo_replacement_1mo_gap | 0 | 13196 ms | id=scenario_leave_22mo_replacement_1mo_gaptype=numericmodel=claude-opus-4-7score=0category=scenariousd_cost=0.121066difficulty=hardagent_answer=Months remaining at surrender: 36 − 22 = 14 months. Gap = 1 month, so replacement covers 13 months. Rent schedule per the agreement: - Months 1–12: original rent (call it R1) - Months 13–24: increased rent R2 - Months 25–36: further increased rent R3 The tenant is 22 months in, so months 23–36 remain (14 months). - Month 23–24: at R2 (the second-tier rent) - Months 25–36: at R3 (the third-tier rent) The document's rent figures (clause 8a/8b) are not legible in the extract, so I'll express the answer in terms of the rents: (a) Gap rent (1 month, month 23) = R2 (b) Replacement covers 13 months: month 24 at R2, months 25–36 (12 months) at R3. Letting fee = 13.2% × (1×R2 + 12×R3) (c) Inventory check-in = £144 × 14/36 = £56.00 (d) Administration = £480 × 14/36 = £186.67 **Total = R2 + 0.132 × (R2 + 12·R3) + £242.67** A specific GBP total cannot be computed because the rent amounts in clauses 8(a) and 8(b) are not readable in the provided markdown.input_tokens=275output_tokens=435expected_answer=£6,421.47matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| term_start | 0 | 6503 ms | id=term_starttype=datemodel=claude-opus-4-7score=0category=datesusd_cost=0.085756difficulty=easyagent_answer=16/09/2019input_tokens=41output_tokens=11expected_answer=05/09/2022matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
| total_rent_fixed_term | 0 | 8091 ms | id=total_rent_fixed_termtype=numericmodel=claude-opus-4-7score=0category=moneyusd_cost=0.092087difficulty=hardagent_answer=Based on the document, the rent figures (£ amounts) in clauses 8(a) and 8(b) are not legible in the OCR output — the actual numeric values for the monthly rent and the increased rents are not present in the extracted markdown, so the total cannot be computed.input_tokens=68output_tokens=90expected_answer=77400matcher_results=[object Object]cache_read_input_tokens=56211cache_creation_input_tokens=0 |
