OptionProbability
uses some non-AdamW optimizer
>= 50% on TerminalBench (https://www.tbench.ai/leaderboard)
Some variation of NSA (Native Sparse Attention)
>= 73% on SWE-Bench Verified (according to epoch.ai)
>= 2 shared experts
>=25T pretraining tokens
>=52B active parameters
>=1.5T parameters
>=16 active experts
1M+ Context
Some image input (multimodality)
>=512 experts
intra-expert communication
DS-MoE with adaptative expert count
>= 60% on BrowseComp (https://www.kaggle.com/benchmarks/openai/browsecomp)
Gemini 2.5 Pro tier or higher on FictionBench (90.6%+ at 192k)
>= 44% on Humanity's Last Exam (text only) at scale.com leaderboard
DeepSeek reports some results with a full-blown deep research agent, and emphasizes that this is the intended use-mode
Releases before November
75
72
67
64
63
61
58
55
52
47
43
42
41
35
34
22
21
20
0
OptionVotes
NO
YES
1154
872
1281
950
Get the latest stories live on any device.
Top Stories