OptionProbability
HCAST - METR
CAIS Remote Labor Index
Epoch Capabilities Index (ECI)
ARC-AGI (any version)
GDPval https://openai.com/index/gdpval
PaperBench https://openai.com/index/paperbench
Opinion poll of Manifold userbase
Wozniak Coffee Test (requires controlling a robot)
Build, debug and test until its of sufficient quality, a complex piece of software like a mobile app including a backend service
Predict the output of an arbitrary set of NAND gates and inputs
Beating Pokemon games
Manually rearrange and overlap 100 random images in an image editor (with no other kinds of edits) to create a recognizable portrait
85
75
69
61
57
50
29
25
17
16
14
154
Other
155
153
156
152
157
151
158
150
159
15
12
11
10
9
8
6
5
33
7
4
3
OptionVotes
YES
NO
1277
783
Deepmind
OpenAI
Anthropic
XAI
35
Get the latest stories live on any device.
Top Stories