
Sakana AI Labs has unveiled DiscoPOP, a state-of-the-art (SOTA) preference optimization algorithm discovered and written by a large language model (LLM). This innovative approach leverages LLMs as code-level mutation operators to improve their own training algorithms. The LLM-driven discovery process uses evolutionary strategies and meta-evolution, referred to as LLM Squared (LLM²), to find new preference optimization loss functions. DiscoPOP, which beats DPO, represents a significant advancement in AI research, showcasing the potential for self-improving AI systems. Sakana AI's LRML method further underscores their pioneering efforts in AI innovation.

Introducing DiscoPOP, the latest release from the team at @SakanaAILabs. This time, it’s a new SOTA preference optimisation algorithm that was discovered and written by an LLM 😮. The LLM-driven discovery process seems generalizable enough, but here it’s been used to create novel… https://t.co/nnCJm06h7A
🎉 Stoked to share our latest work @SakanaAILabs - DiscoPOP 🪩 We leverage LLMs as code-level mutation operators, which improve their own training algorithms. Thereby, we discover various performant preference optimization algorithms using LLM-driven meta-evolution (LLM²) 🔁… https://t.co/wf6cRqucjp
This looks like very exciting work out of Sakana AI (@hardmaru @YesThisIsLion) called LLM Squared, using LLMs to write code and come up with a better way to train LLMs (specifically create SOTA preference optimization algorithms that beat DPO) 👏 Self improving AI anyone? https://t.co/kpEZqw6LqZ