Agent Docs Architecture Benchmark
Strong run (R1): 120 attempts total, 60/model, n=20 per architecture.
Data source:
benchmark-memory/results/summary_*_repo_policy_v2_strong_r1/*.csv