Aug 7, 05:07 PM

Google DeepMind's New Approach Outperforms 14x Larger Models with Test-Time Compute

Google DeepMind, in collaboration with UC Berkeley, has introduced a new approach in optimizing large language models (LLMs) by focusing on test-time compute rather than scaling model parameters. This method, detailed in a recent paper, suggests that by optimally allocating computational resources during test-time, smaller models can outperform models that are 14 times larger in a FLOPs matched evaluation. The study highlights the potential for LLMs to improve their outputs significantly by using more test-time computation, drawing parallels to how humans can improve decision-making by taking more time to think. This concept of 'compute-optimal' scaling is expected to gain more attention as it presents a critical step towards building generally self-improving AI systems.

#Google DeepMind #UC Berkeley

Written with ChatGPT (GPT-4o).

Sources

arXivGPT@arXivGPT
1 year ago
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters https://t.co/O30UDpWEiP https://t.co/uqYV6j3PYU
Rafael Rafailov @ ICML 2024@rm_rafailov
1 year ago
After the LLaMa 3.1 release and ICML, I wan to highlight our paper "Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms". TL;DR we explore the dynamics of over-optimization in DPO/IPO/SLiC and find similiar "reward hacking" issues as online RLHF.👇 https://t.co/YXTyMG7uRJ
Kelvin Xu@imkelvinxu
2 years ago
Test-time improvement of LLMs is an area that I expect will receive a lot more attention 🍓. In this paper (led by the wonderful @sea_snell), we study what it means to efficiently scale test time compute and discuss implications on LLM scaling. Excited to see this finally out! 🚀 https://t.co/bKuc4gDgS3

Additional media

Image #1 for story google-deepmind-s-new-approach-outperforms-14x-larger-models-test-time-compute

Image #2 for story google-deepmind-s-new-approach-outperforms-14x-larger-models-test-time-compute

Google DeepMind's New Approach Outperforms 14x Larger Models with Test-Time Compute

Sources

Additional media

Similar Stories