Benchmark News
Top stories
Prediction markets for Benchmark
Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?
Nov 8, 8:25 PMJan 2, 7:59 AM
63%chance
239238329
OptionVotes
5784
4452
Will an AI score over 80% on FrontierMath Benchmark in 2025
Nov 9, 1:06 AMDec 31, 10:59 PM
13%chance
112115251
OptionVotes
1303
961
Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?
Nov 8, 11:52 PMJan 2, 7:59 AM
53.19%chance
8336738
OptionVotes
1066
938
By 2029 will any AI be able to watch a movie and accurately tell you what is going on? (Gary Marcus benchmark #1)
Sep 16, 8:33 PMJan 1, 8:00 AM
94.31%chance
9025606
OptionVotes
3919
917
In 2029, will any AI be able to construct "reasonably" bug-free code of >= 10k LOC from a natural language specification? (Gary Marcus benchmark #4)
Sep 16, 8:43 PMJan 1, 8:00 AM
80.91%chance
11225182
OptionVotes
1618
741
Which AI companies will release a top-scoring LLM on the Scale AI Coding benchmark in 2025?
Dec 28, 7:34 AMJan 1, 7:59 AM
2418075
OptionProbability
35
34
33
32
30
29
27
In 2029, will any AI be able to work as a competent cook in an arbitrary kitchen? (Gary Marcus benchmark #3)
Sep 16, 8:37 PMJan 1, 8:00 AM
65.07%chance
11314902
OptionVotes
2220
880
In 2029, will any AI be able to take an arbitrary proof in the mathematical literature and translate it into a form suitable for symbolic verification? (Gary Marcus benchmark #5)
Sep 16, 8:47 PMJan 1, 8:00 AM
80.05%chance
9814256
OptionVotes
2083
697
On January 1, 2027, a Transformer-like model will continue to hold the state-of-the-art position in most benchmark
Jun 13, 9:21 AMDec 31, 9:59 PM
84.23%chance
415711
OptionVotes
2299
317
Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?
Feb 7, 7:34 PMJan 1, 7:59 AM
71.06%chance
323177
OptionVotes
1567
638
Will at least 3 of the Gary Marcus benchmark questions resolve YES?
Sep 16, 9:19 PMJan 1, 8:00 AM
89.48%chance
413065
OptionVotes
1830
824
Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
Apr 8, 6:06 PMJan 1, 4:59 AM
73.23%chance
14560
OptionVotes
1156
931
Latest stories
Latest stories