OptionProbability
Pretraining data composition
Doesn't use any scale.ai training data
Offline policy learning RLHf
Task vectors like golden gate Claude
88
63
49
37
OptionVotes
NO
YES
1003
979