OptionProbability
J. Something 'just works' on the order of eg: train a predictive/imitative/generative AI on a human-generated dataset, and RLHF her to be unfailingly nice, generous to weaker entities, and determined to make the cosmos a lovely place.
K. Somebody discovers a new AI paradigm that's powerful enough and matures fast enough to beat deep learning to the punch, and the new paradigm is much much more alignable than giant inscrutable matrices of floating-point numbers.
C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
M. "We'll make the AI do our AI alignment homework" just works as a plan. (Eg the helping AI doesn't need to be smart enough to be deadly; the alignment proposals that most impress human judges are honest and truthful and successful.)
Something wonderful happens that isn't well-described by any option listed. (The semantics of this option may change if other options are added.)
A. Humanity successfully coordinates worldwide to prevent the creation of powerful AGIs for long enough to develop human intelligence augmentation, uploading, or some other pathway into transcending humanity's window of fragility.
B. Humanity puts forth a tremendous effort, and delays AI for long enough, and puts enough desperate work into alignment, that alignment gets solved first.
G. It's impossible/improbable for something sufficiently smarter and more capable than modern humanity to be created, that it can just do whatever without needing humans to cooperate; nor does it successfully cheat/trick us.
I. The tech path to AGI superintelligence is naturally slow enough and gradual enough, that world-destroyingly-critical alignment problems never appear faster than previous discoveries generalize to allow safe further experimentation.
O. Early applications of AI/AGI drastically increase human civilization's sanity and coordination ability; enabling humanity to solve alignment, or slow down further descent into AGI, etc. (Not in principle mutex with all other answers.)
E. Whatever strange motivations end up inside an unalignable AGI, or the internal slice through that AGI which codes its successor, they max out at a universe full of cheerful qualia-bearing life and an okay outcome for existing humans.
D. Early powerful AGIs realize that they wouldn't be able to align their own future selves/successors if their intelligence got raised further, and work honestly with humans on solving the problem in a way acceptable to both factions.
H. Many competing AGIs form an equilibrium whereby no faction is allowed to get too powerful, and humanity is part of this equilibrium and survives and gets a big chunk of cosmic pie.
L. Earth's present civilization crashes before powerful AGI, and the next civilization that rises is wiser and better at ops. (Exception to 'okay' as defined originally, will be said to count as 'okay' even if many current humans die.)
F. Somebody pulls off a hat trick involving blah blah acausal blah blah simulations blah blah, or other amazingly clever idea, which leads an AGI to put the reachable galaxies to good use despite that AGI not being otherwise alignable.
N. A crash project at augmenting human intelligence via neurotech, training mentats via neurofeedback, etc, produces people who can solve alignment before it's too late, despite Earth civ not slowing AI down much.
If you write an argument that breaks down the 'okay outcomes' into lots of distinct categories, without breaking down internal conjuncts and so on, Reality is very impressed with how disjunctive this sounds and allocates more probability.
You are fooled by at least one option on this list, which out of many tries, ends up sufficiently well-aimed at your personal ideals / prejudices / the parts you understand less well / your own personal indulgences in wishful thinking.
19
18
11
8
8
6
5
5
5
5
4
3
1
1
0
0
0
0
OptionProbability
Humanity coordinates to prevent the creation of potentially-unsafe AIs.
Sheer Dumb Luck. The aligned AI agrees that alignment is hard, any Everett branches in our neighborhood with slightly different AI models or different random seeds are mostly dead.
Alignment is not properly solved, but core human values are simple enough that partial alignment techniques can impart these robustly. Despite caring about other things, it is relatively cheap for AGI to satisfy human values.
Other
Yudkowsky is trying to solve the wrong problem using the wrong methods based on a wrong model of the world derived from poor thinking and fortunately all of his mistakes have failed to cancel out
Someone solves agent foundations
AGI is never built (indefinite global moratorium)
We create a truth economy. https://manifold.markets/Krantz/is-establishing-a-truth-economy-tha?r=S3JhbnR6
AIs will not have utility functions (in the same sense that humans do not), their goals such as they are will be relatively humanlike, and they will be "computerish" and generally weakly motivated compared to humans.
Eliezer finally listens to Krantz.
The assumed space of possible minds is a wildly anti-inductive over estimate, intelligence requires and is constrained by consciousness, and intelligent AI is in the approximate dolphin/whale/elephant/human cluster, making it manageable
Ethics turns out to be a precondition of superintelligence
Humans become transhuman through other means before AGI happens
Alignment is unsolvable. AI that cares enough about its goal to destroy humanity is also forced to take it slow trying to align its future self, preventing run-away.
Aliens invade and stop bad |AI from appearing
There is a natural limit of effectiveness of intelligence, like diminishing returns, and it is on the level IQ=1000. AIs have to collaborate with humans.
Something to do with self-other overlap, which Eliezer called "Not obviously stupid" - https://www.lesswrong.com/posts/hzt9gHpNwA2oHtwKX/self-other-overlap-a-neglected-approach-to-ai-alignment?commentId=WapHz3gokGBd3KHKm
Almost all human values are ex post facto rationalizations and enough humans survive to do what they always do
Pascals mugging: it’s not okay in 99.9% of the worlds but the 0.1% are so much better that the combined EV of AGI for the multiverse is positive
The Super-Strong Self Sampling Assumption (SSSSA) is true. If superintelligence is possible, "I" will become the superintelligence.
AI control gets us helpful enough systems without being deadly
Alignment is impossible. Sufficiently smart AIs know this and thus won't improve themselves and won't create successor AIs, but will instead try to prevent existence of smarter AIs, just as smart humans do.
an aligned AGI is built and the aligned AGI prevents the creation of any unaligned AGI.
I've been a good bing 😊
We make risk-conservative requests to extract alignment-related work out of AI-systems that were boxed prior to becoming superhuman. We somehow manage to achieve a positive feedback-loop in alignment/verification-abilities.
The response to AI advancements or failures makes some governments delay the timelines
Far more interesting problems to solve than take over the world and THEN solve them. The additional kill all humans step is either not a low-energy one or just by chance doesn't get converged upon.
AIs make "proof-like" argumentation for why output does/is what we want. We manage to obtain systems that *predict* human evaluations of proof-steps, and we manage to find/test/leverage regularities for when humans *aren't* fooled.
A lot of humans participate in a slow scalable oversight-style system, which is pivotally used/solves alignment enough
AI systems good at finding alignment solutions to capable systems (via some solution in the space of alignment solutions, supposing it is non-null, and that we don't have a clear trajectory to get to) have find some solution to alignment.
Something less inscrutable than matrices works fast enough
There’s some cap on the value extractible from the universe and we already got the 20%
SHA3-256: 1f90ecfdd02194d810656cced88229c898d6b6d53a7dd6dd1fad268874de54c8
Robot Love!!
AI thinks it is in a simulation controlled by Roko's basilisk
The human brain is the perfect arrangement of atoms for a "takeover the world" agent, so AGI has no advantage over us in that task.
Aligned AI is more economically valuable than unaligned AI. The size of this gap and the robustness of alignment techniques required to achieve it scale up with intelligence, so economics naturally encourages solving alignment.
Humans and human tech (like AI) never reach singularity, and whatever eats our lightcone instead (like aliens) happens to create an "okay" outcome
AIs never develop coherent goals
Rolf Nelson's idea that we make precommitment to simulate all possible bad AIs works – and keeps AI in check.
Nick Bostrom's idea (Hail Mary) that AI will preserve humans to trade with possible aliens works
For some reason, the optimal strategy for AGIs is just to head somewhere with far more resources than Earth, as fast as possible. All unaligned AGIs immediately leave, and, for some reason, do not leave anything behind that kills us.
An AI that is not fully superior to humans launches a failed takeover, and the resulting panic convinces the people of the world to unite to stop any future AI development.
We're inside of a simulation created by an entity that has values approximately equal to ours, and it intervenes and saves us from unaligned AI.
God exists and stops the AGI
Someone at least moderately sane leads a campaign, becomes in charge of a major nation, and starts a secret project with enough resources to solve alignment, because it turns out there's a way to convert resources into alignment progress.
Someone creates AGI(s) in a box, and offers to split the universe. They somehow find a way to arrange this so that the AGI(s) cannot manipulate them or pull any tricks, and the AGI(s) give them instructions for safe pivotal acts.
Someone understands how minds work enough to successfully build and use one directed at something world-savingly enough
Dolphins, or some other species, but probably dolphins, have actually been hiding in the shadows, more intelligent than us, this whole time. Their civilization has been competent enough to solve alignment long before we can create an AGI.
AGIs' takeover attempts are defeated by Michael Biehn with a pipe bomb.
Eliezer funds the development of controllable nanobots that melt computer circuitry, and they destroy all computers, preventing the Singularity. If Eliezer's past self from the 90s could see this, it would be so so so soooo hilarious.
Several AIs are created but they move in opposite directions with near light speed, so they never interacts. At least one of them is friendly and it gets a few percents of the total mass of the universe.
Unfriendly AIs choose to advance not outwards but inwards, and form a small blackhole which helps them to perform more calculations than could be done with the whole mass of the universe. For external observer such AIs just disappear.
Any sufficiently advance AI halts because it wireheads itself or halts for some other reasons. This puts a natural limit on AI's intelligence, and lower intelligence AIs are not that dangerous.
Because of quantum immortality we will observe only the worlds where AI will not kill us (assuming that s-risks chances are even smaller, it is equal to ok outcome).
Techniques along the lines outlined by Collin Burns turn out to be sufficient for alignment (AIs/AGIs are made truthful enough that they can be used to get us towards full alignment)
Social contagion causes widespread public panic about AI, making it a bad legal or PR move to invest in powerful AIs without also making nearly-crippling safety guarantees
A smaller AI disaster causes widespread public panic about AI, making it a bad legal or PR move to invest in powerful AIs without also making nearly-crippling safety guarantees
Getting things done in Real World is as hard for AGI as it is for humans. AGI needs human help, but aligning humans is as impossible as aligning AIs. Humans and AIs create billions of competing AGIs with just as many goals.
Development and deployment of advanced AI occurs within a secure enclave which can only be interfaced with via a decentralized governance protocol
Friendly AI more likely to resurrect me than paperclipper or suffering maximiser. Because of quantum immortality I will find myself eventually resurrected. Friendly AIs will wage a multiverse wide war against s-risks, s-risks are unlikely.
High-level self-improvement (rewriting code) is intrinsically risky process, so AIs will prefer low level and slow self-improvement (learning), thus AIs collaborating with humans will have advantage. Ends with posthumans ecosystem.
Human consciousness is needed to collapse wave function, and AI can't do it. Thus humans should be preserved and they may require complete friendliness in exchange (or they will be unhappy and produce bad collapses)
Power dynamics stay multi-polar. Partly easy copying of SotA performance, bigger projects need high coordination, and moderate takeoff speed. And "military strike on all society" remains an abysmal strategy for practically all entities.
First AI is actually a human upload (maybe LLM-based model of person) AND it will be copies many times to form weak AI Nanny which prevents creation of other AIs.
Nanotech is difficult without experiments, so no mail order AI Grey Goo; Humans will be the main workhorse of AI everywhere. While they will be exploited, this will be like normal life from inside
ASI needs not your atoms but information. Humans will live very interesting lives.
Something else
Moral Realism is true, the AI discovers this and the One True Morality is human-compatible.
Valence realism is true. AGI hacks itself to experiencing every possible consciousness and picks the best one (for everyone)
AGI develops natural abstractions sufficiently similar to ours that it is aligned with us by default
AGI discovers new physics and exits to another dimension (like the creatures in Greg Egan’s Crystal Nights).
Alien Information Theory is true (this is discovered by experiments with sustained hours/days long DMT trips). The aliens have solved alignment and give us the answer.
AGI executes a suicide plan that destroys itself and other potential AGIs, but leaves humans in an okay outcome.
Multipolar AGI Agents run wild on the internet, hacking/breaking everything, causing untold economic damage but aren't focused enough to manipulate humans to achieve embodiment. In the aftermath, humanity becomes way saner about alignment.
Some form of objective morality is true, and any sufficiently intelligent agent automatically becomes benevolent.
Co-operative AI research leads to the training of agents with a form of pro-social concern that generalises to out of distribution agents with hidden utilities, i.e. humans.
Orthogonality Thesis is false.
"Corrigibility" is a bit more mathematically straightforward than was initially presumed, in the sense that we can expect it to occur, and is relatively easy to predict, even under less-than-ideal conditions.
Either the "strong form" of the Orthogonality Thesis is false, or "Goal-directed agents are as tractable as their goals" is true while goal-sets which are most threatening to humanity are relatively intractable.
A concerted effort targets an agent at a capability plateau which is adequate to defer the hard parts of the problem until later. The necessary near-term problems to solve didn't depend on deeply modeling human values.
We successfully chained God
22
10
7
7
6
6
6
6
5
5
4
2
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
OptionProbability
Compatible with Switch 1 Joy-Cons (even if only Bluetooth)
Backwards compatible with physical Switch 1 games
Backwards compatible with digital Switch 1 games
Crossplay with Switch 1 in any first-party game released within the first 6 months after launch
Multiple launch SKUs
Launch title (game released on the same day as the system) with "Mario" in the name
Name of the console contains the word "Switch"
The name of the console is correctly leaked over two weeks before it is revealed
Launch day system software includes a Mii maker
Launch title (game released on the same day as the system) with "World" in the name
Launch title (game released on the same day as the system) that also came out/is coming out for Switch 1
Backwards compatible with physical Switch 1 games, AND allows you to play a better looking or performing version of at least one Switch 1 game with the original Switch 1 cartridge within 6 months of the console launching
Joy-Cons can be used as a mouse
A new pro controller will be released on the same day the console comes out
New SKU not available at launch available within one year after release
More than two themes before 2027
Any launch SKU has a MSRP not ending in "9.99" in the US
Name of the console contains the word "Super"
Name of the console contains the word "New"
Any launch SKU has an OLED screen
Over 180 days between reveal and release (July 15 deadline)
Revealed this week (before September 21st, 11:59:59pm ET)
Launch title (game released on the same day as the system) with "Zelda" in the name
Launch day system software includes an internet browser (general-purpose browser that deliberately allows access to the wider Web, like the 3DS or Wii U browser)
No launch SKU has 12GB RAM
No launch SKU has 256GB storage
Name of at least one launch SKU contains "XL"
The cheapest launch SKU costs ≤$300
Will have some kind of "achievements" or "trophies" system (under any name)
The Joy-Cons have inside-out tracking (via camera or LiDAR)
Has a social media or video-sharing service called Vidmiio (announced or available by launch)
First-party Joy-Cons attach or detach using electromagnets
A launch SKU has Joy-Cons that have a non-grayscale shell (as opposed to the black shells in the trailer).
100
100
100
100
100
100
100
100
100
100
100
100
100
100
99
40
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
OptionProbability
40,000-400,000 km (Transits Inside Moon's Orbit)
more than 400,000 km (Distant)
8500-39,999 km (Transits Medium Earth Orbit)
7500-8499 km (Transits Low Earth Orbit)
6400-7499 km (Graze)
Unknown
<6400 km (Impact)
69
16
7
3
2
2
1
OptionProbability
safely returns to Earth
Will attempt to say something momentous
Had been into space as of 2023
Right handed
Is a pilot (but not necessarily for the mission)
American citizen
Have child/ren
military or ex military background
Isn't white
Is the mission commander
Will be recorded (e.g. interview, podcast transcript) dissing moon landing hoaxers, either before or within 12 months of moonwalk
Christian
Have a PhD
Will step out with their right foot first
Have 2+ bachelors degrees
video recording is available of them falling over on the lunar surface (within 1 year of moonwalk)
Will attempt to say something momentous, and it's generally considered not to have been momentous 12 months after landing
Born in a state that majority voted for Trump in 2024 election
First moon-words will be a joke (e.g. "oh, did I leave the oven on?" or "hey i can see my house from up here!" or even "oh wow there's an alien life form waving at me!")
Will attempt to say something momentous, and it's generally considered so 12 months after landing
Will have a nickname that doesn't directly derive from their real name (eg Buzz)
Is 68 inches or taller
Male
is 50 years old or older (at time of moonwalk)
Films a tiktok (or similar short format film) on the lunar surface featuring at least some dancing (inside the lander counts) (video is available within 1 year of moonwalk)
Chinese Citizen
No tertiary qualification (honorary degrees don't count)
Will not be a single person; two or more people will take the first step (almost) simultaneously
Expressed doubt on whether the Apollo landings happened (digging up an embarrassing tweet from when they were 13 counts)
bald
has some kind of neural implant
is 30 years old or younger (at time of moonwalk)
Will be recorded attempting a backflip on the lunar surface (video is available within 1 year of moonwalk)
Ginger
Is transgender
94
88
82
79
76
72
72
72
66
66
63
59
57
50
46
45
45
45
43
39
38
37
35
35
34
32
27
25
23
16
15
12
10
9
1
OptionVotes
YES
NO
1835
491
OptionProbability
K. Somebody discovers a new AI paradigm that's powerful enough and matures fast enough to beat deep learning to the punch, and the new paradigm is much much more alignable than giant inscrutable matrices of floating-point numbers.
I. The tech path to AGI superintelligence is naturally slow enough and gradual enough, that world-destroyingly-critical alignment problems never appear faster than previous discoveries generalize to allow safe further experimentation.
C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
B. Humanity puts forth a tremendous effort, and delays AI for long enough, and puts enough desperate work into alignment, that alignment gets solved first.
Something wonderful happens that isn't well-described by any option listed. (The semantics of this option may change if other options are added.)
M. "We'll make the AI do our AI alignment homework" just works as a plan. (Eg the helping AI doesn't need to be smart enough to be deadly; the alignment proposals that most impress human judges are honest and truthful and successful.)
A. Humanity successfully coordinates worldwide to prevent the creation of powerful AGIs for long enough to develop human intelligence augmentation, uploading, or some other pathway into transcending humanity's window of fragility.
E. Whatever strange motivations end up inside an unalignable AGI, or the internal slice through that AGI which codes its successor, they max out at a universe full of cheerful qualia-bearing life and an okay outcome for existing humans.
J. Something 'just works' on the order of eg: train a predictive/imitative/generative AI on a human-generated dataset, and RLHF her to be unfailingly nice, generous to weaker entities, and determined to make the cosmos a lovely place.
O. Early applications of AI/AGI drastically increase human civilization's sanity and coordination ability; enabling humanity to solve alignment, or slow down further descent into AGI, etc. (Not in principle mutex with all other answers.)
D. Early powerful AGIs realize that they wouldn't be able to align their own future selves/successors if their intelligence got raised further, and work honestly with humans on solving the problem in a way acceptable to both factions.
F. Somebody pulls off a hat trick involving blah blah acausal blah blah simulations blah blah, or other amazingly clever idea, which leads an AGI to put the reachable galaxies to good use despite that AGI not being otherwise alignable.
L. Earth's present civilization crashes before powerful AGI, and the next civilization that rises is wiser and better at ops. (Exception to 'okay' as defined originally, will be said to count as 'okay' even if many current humans die.)
G. It's impossible/improbable for something sufficiently smarter and more capable than modern humanity to be created, that it can just do whatever without needing humans to cooperate; nor does it successfully cheat/trick us.
H. Many competing AGIs form an equilibrium whereby no faction is allowed to get too powerful, and humanity is part of this equilibrium and survives and gets a big chunk of cosmic pie.
N. A crash project at augmenting human intelligence via neurotech, training mentats via neurofeedback, etc, produces people who can solve alignment before it's too late, despite Earth civ not slowing AI down much.
You are fooled by at least one option on this list, which out of many tries, ends up sufficiently well-aimed at your personal ideals / prejudices / the parts you understand less well / your own personal indulgences in wishful thinking.
If you write an argument that breaks down the 'okay outcomes' into lots of distinct categories, without breaking down internal conjuncts and so on, Reality is very impressed with how disjunctive this sounds and allocates more probability.
20
12
10
8
8
7
6
5
5
5
3
3
3
2
1
1
1
1
OptionVotes
NO
YES
2192
456
OptionProbability
Passion / love (or similar romantic)
Self loathing / depression / similar
Lonely
Brave
Guilty
Obsession
Schadenfreude / epicaricacy
Rebellion
Suspicion
Awe
Jealous
Lust / horny / other sexual feeling
Curiosity
Intoxication
sassy / rude
Patriotism
Suicidal
79
65
50
42
35
31
31
31
31
31
23
20
15
6
5
4
3
OptionVotes
YES
NO
545
141
OptionVotes
YES
NO
300
33
OptionProbability
Platform for migrants to start legal & profitable microbusinesses.
Database of the Most Impactful Research Questions by Discipline and Cause Area
Leveraging biophysical techniques to improve the efficiency of successful pregnancies in older women
Research microeconomic disruptions caused by AGI
The Odyssean Institute
Mechanical Library
Educate the public about high impact causes
Science prediction markets for replicability
Free Our Knowledge: Improving scientific publishing through collective action
Researching low-input traditional agricultural techniques.
Growing human blood vessels in the lab to unlock replacement tissue/organ therapies
LLM Multi-Actor Tool to automate Economic Experiments
Building software to demystify land use regulations
Gather all formal demographic fertility and population projection methods into a single open source R package for academic, policy, and popular use
Psychosis in velocardiofacial syndrome (DiGeorge/22q11.2 deletion syndrome)
Conduct preliminary studies on the impact of MKP-2 on COVID-19 vaccine mediated depression
MPhil/PhD in Psychology to understand understand how issue polarisation arises from divergences in belief revision via Bayesian networks
Transform waste streams into biodegradable alternatives
H-Test: identifying a set of "blindspot" tasks for LLMs that doesn't scale (not inverse, close to no effect) with language training
Creating a swarm of robotic bees to either pollinate or spray pesticides
Create accessible synthesis of neuroscience insider knowledge
Conduct research on voice biomarkers, a promising early detection mechanism for mental illnesses, including Alzheimer’s, Parkinson's, depression, and more.
Build a digital marketplace and management platform to empower smallholder farmers in Kenya by reducing post-harvest food loss
Upgrade YIMBY messaging
Publish a book on Egan education for parents
Weather futures market for forecasting, insurance, and climate engineering.
Apply particle physics clustering to embedding space of LLM
Fund a detour on my PhD to improve brute force statistical tooling
Write three articles that integrate meta-analysis and meta-science
Quantifying the costs of the Jones Act
Run a public online Turing Test with a variety of models and prompts
Run a self-help program on WhatsApp to reduce depression in low and middle-income countries
Build an app to track ones impact on animal cruelty
Year one of AI Safety Tokyo
Strengthening digital health information system for epidemic control in Pakistan
Online videos of Fluidity Forum 2024 talks
Scaling Legal Impact for Chickens to make factory-farm cruelty a liability
Distribute HPMOR copies in Bangalore, India
An online science platform
Convert a hybrid car to chip wood and generate electricity
Start an online editorial journal focusing on paradigm development in psychiatry and psychology
News through prediction markets
A spaced-repetition first gamified learning platform
Vegans should have the right to eat in public institutions : help us turn our right into a reality
Art and Technology XR game providing Ecological and Social-Emotional Learning
Replicate social science research
Original Research Study on Effective Communication With Political Staffers
Empower Ukraine: Invest in Businesses Driving Positive Change
75
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
50
47
40
37
37
37
37
37
37
37
37
37
37