OptionProbability
J. Something 'just works' on the order of eg: train a predictive/imitative/generative AI on a human-generated dataset, and RLHF her to be unfailingly nice, generous to weaker entities, and determined to make the cosmos a lovely place.
I. The tech path to AGI superintelligence is naturally slow enough and gradual enough, that world-destroyingly-critical alignment problems never appear faster than previous discoveries generalize to allow safe further experimentation.
Something wonderful happens that isn't well-described by any option listed. (The semantics of this option may change if other options are added.)
M. "We'll make the AI do our AI alignment homework" just works as a plan. (Eg the helping AI doesn't need to be smart enough to be deadly; the alignment proposals that most impress human judges are honest and truthful and successful.)
C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
B. Humanity puts forth a tremendous effort, and delays AI for long enough, and puts enough desperate work into alignment, that alignment gets solved first.
O. Early applications of AI/AGI drastically increase human civilization's sanity and coordination ability; enabling humanity to solve alignment, or slow down further descent into AGI, etc. (Not in principle mutex with all other answers.)
K. Somebody discovers a new AI paradigm that's powerful enough and matures fast enough to beat deep learning to the punch, and the new paradigm is much much more alignable than giant inscrutable matrices of floating-point numbers.
A. Humanity successfully coordinates worldwide to prevent the creation of powerful AGIs for long enough to develop human intelligence augmentation, uploading, or some other pathway into transcending humanity's window of fragility.
H. Many competing AGIs form an equilibrium whereby no faction is allowed to get too powerful, and humanity is part of this equilibrium and survives and gets a big chunk of cosmic pie.
L. Earth's present civilization crashes before powerful AGI, and the next civilization that rises is wiser and better at ops. (Exception to 'okay' as defined originally, will be said to count as 'okay' even if many current humans die.)
D. Early powerful AGIs realize that they wouldn't be able to align their own future selves/successors if their intelligence got raised further, and work honestly with humans on solving the problem in a way acceptable to both factions.
E. Whatever strange motivations end up inside an unalignable AGI, or the internal slice through that AGI which codes its successor, they max out at a universe full of cheerful qualia-bearing life and an okay outcome for existing humans.
G. It's impossible/improbable for something sufficiently smarter and more capable than modern humanity to be created, that it can just do whatever without needing humans to cooperate; nor does it successfully cheat/trick us.
F. Somebody pulls off a hat trick involving blah blah acausal blah blah simulations blah blah, or other amazingly clever idea, which leads an AGI to put the reachable galaxies to good use despite that AGI not being otherwise alignable.
N. A crash project at augmenting human intelligence via neurotech, training mentats via neurofeedback, etc, produces people who can solve alignment before it's too late, despite Earth civ not slowing AI down much.
If you write an argument that breaks down the 'okay outcomes' into lots of distinct categories, without breaking down internal conjuncts and so on, Reality is very impressed with how disjunctive this sounds and allocates more probability.
You are fooled by at least one option on this list, which out of many tries, ends up sufficiently well-aimed at your personal ideals / prejudices / the parts you understand less well / your own personal indulgences in wishful thinking.
19
18
16
10
9
7
6
5
3
2
2
1
1
1
0
0
0
0
OptionProbability
#144 – Athena Aktipis on why cancer is actually one of the fundamental phenomena in our universe
#145 – Christopher Brown on why slavery abolition wasn't inevitable
#146 – Robert Long on why large language models like GPT (probably) aren't conscious
#147 – Spencer Greenberg on stopping valueless papers from getting into top journals
#148 – Johannes Ackva on unfashionable climate interventions that work, and fashionable ones that don't
#151 – Ajeya Cotra on accidentally teaching AI models to deceive us
#149 – Tim LeBon on how altruistic perfectionism is self-defeating
#150 – Tom Davidson on how quickly AI could transform the world
#152 – Joe Carlsmith on navigating serious philosophical confusion
#153 – Elie Hassenfeld on two big picture critiques of GiveWell's approach, and six lessons from their recent work
#154 – Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters
#155 – Lennart Heim on the compute governance era and what has to come after
#156 – Markus Anderljung on how to regulate cutting-edge AI models
#157 – Ezra Klein on existential risk from AI and what DC could do about it
#158 – Holden Karnofsky on how AIs might take over even if they're no smarter than humans, and his 4-part playbook for AI risk
#159 – Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less
#160 – Hannah Ritchie on why it makes sense to be optimistic about the environment
#161 – Michael Webb on whether AI will soon cause job loss, lower incomes, and higher inequality — or the opposite
#162 – Mustafa Suleyman on getting Washington and Silicon Valley to tame AI
#163 – Toby Ord on the perils of maximising the good that you do
#166 – Tantum Collins on what he's learned as an AI policy insider at the White House, DeepMind and elsewhere
#167 – Seren Kell on the research gaps holding back alternative proteins from mass adoption
#168 – Ian Morris on whether deep history says we're heading for an intelligence explosion
#164 – Kevin Esvelt on cults that want to kill everyone, stealth vs wildfire pandemics, and how he felt inventing gene drives
#165 – Anders Sandberg on war in space, whether civilisations age, and the best things possible in our universe
#169 – Paul Niehaus on whether cash transfers cause economic growth, and keeping theft to acceptable levels
#170 – Santosh Harish on how air pollution is responsible for ~12% of global deaths — and how to get that number down
#171 – Alison Young on how top labs have jeopardised public health with repeated biosafety failures
#172 – Bryan Caplan on why you should stop reading the news
#173 – Jeff Sebo on digital minds, and how to avoid sleepwalking into a major moral catastrophe
#174 – Nita Farahany on the neurotechnology already being used to convict criminals and manipulate workers
#175 – Lucia Coulter on preventing lead poisoning for $1.66 per child
#176 – Nathan Labenz on the final push for AGI, understanding OpenAI's leadership drama, and red-teaming frontier models
99
99
53
41
34
31
30
24
18
18
18
17
17
17
17
17
17
17
16
16
16
16
16
15
15
14
14
14
14
14
14
14
14
OptionProbability
DeepMind
OpenAI
Anthropic
US Government
xAI
people not employed by a company
Communist Party of China
Other
Meta
Safe Superintelligence (SSI)
DeepSeek
Nvidia
Tesla
eleutherai
character.ai
microsoft
google brain
None of the above
Keen
39
14
11
10
10
5
5
2
1
1
1
0
0
0
0
0
0
0
0
OptionProbability
Baidu (Ernie)
OpenAI
Anthropic
DeepSeek
Alibaba (Qwen)
Z.ai
Moonshot (Kimi)
Mistral
Nvidia
AI21 Labs (Jamba)
Thinking Machines Lab
Cohere (Command R+)
Reka AI
NexusFlow (Athene)
01 AI (Yi)
Safe Superintelligence / SSI
100
100
100
100
82
82
78
73
40
31
22
22
21
18
16
11
10
OptionProbability
K. Somebody discovers a new AI paradigm that's powerful enough and matures fast enough to beat deep learning to the punch, and the new paradigm is much much more alignable than giant inscrutable matrices of floating-point numbers.
I. The tech path to AGI superintelligence is naturally slow enough and gradual enough, that world-destroyingly-critical alignment problems never appear faster than previous discoveries generalize to allow safe further experimentation.
C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
Something wonderful happens that isn't well-described by any option listed. (The semantics of this option may change if other options are added.)
A. Humanity successfully coordinates worldwide to prevent the creation of powerful AGIs for long enough to develop human intelligence augmentation, uploading, or some other pathway into transcending humanity's window of fragility.
B. Humanity puts forth a tremendous effort, and delays AI for long enough, and puts enough desperate work into alignment, that alignment gets solved first.
D. Early powerful AGIs realize that they wouldn't be able to align their own future selves/successors if their intelligence got raised further, and work honestly with humans on solving the problem in a way acceptable to both factions.
M. "We'll make the AI do our AI alignment homework" just works as a plan. (Eg the helping AI doesn't need to be smart enough to be deadly; the alignment proposals that most impress human judges are honest and truthful and successful.)
O. Early applications of AI/AGI drastically increase human civilization's sanity and coordination ability; enabling humanity to solve alignment, or slow down further descent into AGI, etc. (Not in principle mutex with all other answers.)
E. Whatever strange motivations end up inside an unalignable AGI, or the internal slice through that AGI which codes its successor, they max out at a universe full of cheerful qualia-bearing life and an okay outcome for existing humans.
F. Somebody pulls off a hat trick involving blah blah acausal blah blah simulations blah blah, or other amazingly clever idea, which leads an AGI to put the reachable galaxies to good use despite that AGI not being otherwise alignable.
J. Something 'just works' on the order of eg: train a predictive/imitative/generative AI on a human-generated dataset, and RLHF her to be unfailingly nice, generous to weaker entities, and determined to make the cosmos a lovely place.
H. Many competing AGIs form an equilibrium whereby no faction is allowed to get too powerful, and humanity is part of this equilibrium and survives and gets a big chunk of cosmic pie.
G. It's impossible/improbable for something sufficiently smarter and more capable than modern humanity to be created, that it can just do whatever without needing humans to cooperate; nor does it successfully cheat/trick us.
L. Earth's present civilization crashes before powerful AGI, and the next civilization that rises is wiser and better at ops. (Exception to 'okay' as defined originally, will be said to count as 'okay' even if many current humans die.)
N. A crash project at augmenting human intelligence via neurotech, training mentats via neurofeedback, etc, produces people who can solve alignment before it's too late, despite Earth civ not slowing AI down much.
You are fooled by at least one option on this list, which out of many tries, ends up sufficiently well-aimed at your personal ideals / prejudices / the parts you understand less well / your own personal indulgences in wishful thinking.
If you write an argument that breaks down the 'okay outcomes' into lots of distinct categories, without breaking down internal conjuncts and so on, Reality is very impressed with how disjunctive this sounds and allocates more probability.
20
10
8
7
6
6
6
6
6
4
4
4
3
2
2
1
1
1
OptionProbability
2026
2027
2028
2029
2030 or later
2025
44
21
13
11
10
1
OptionProbability
Anthropic
DeepMind
Other
OpenAI
Safe Superintelligence Inc
Meta
xAI
Alibaba
None (No AI labs exist)
US Government
Thinking Machines
Stability AI
Tesla
Keen
CommaAI
Conjecture
Baidu
Amazon
Apple
Microsoft
Comma
NVIDIA
Midjourney
DeepSeek
High-Flyer
29
24
18
13
4
2
2
2
2
2
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
OptionProbability
Alphabet / Google Deepmind / Gemini
OpenAI / ChatGPT
xAI / Grok
DeepSeek
Anthropic / Claude
Mistral
Microsoft
Meta / Llama
Safe Superintelligence (SSI)
Apple
100
75
45
41
33
26
15
10
9
8
OptionVotes
YES
NO
1415
789
OptionProbability
xAi
DeepMind
OpenAI
Anthropic
US government
Other
Meta
Chinese government
Other government
DeepSeek
Safe Superintelligence
24
15
15
9
7
7
6
5
5
3
3
OptionVotes
NO
YES
358
28
OptionVotes
NO
YES
1011
889
