"Language models can autonomously identify and prioritize domains rich in knowledge, optimizing their storage capacity." — that's a really interesting finding. https://t.co/2yzmjkJMhM
New study finds Large Language Models store 2 bits of knowledge per parameter, showing how size, training, architecture & data quality affect their capacity: https://t.co/uv1OrquLEk https://t.co/ncblHYO89C
The Physics of Language Models Investigates knowledge capacity scaling laws where it evaluates a model’s capability via loss or benchmarks, to estimate the number of knowledge bits a model stores. Quote from the paper: "Language models can and only can store 2 bits of knowledge… https://t.co/koFMZJPq4t
Google DeepMind has introduced a new method called "Mixture-of-Depths" (MoD) aimed at enhancing the computational efficiency of transformer-based language models. This innovative approach dynamically allocates compute resources by assigning importance weights to input tokens, thereby making transformer models more efficient. The development is part of a broader effort to improve the sustainability of AI development by optimizing computational resources. Additionally, recent studies have shed light on the knowledge capacity of large language models (LLMs), finding that they can store "2 bits of knowledge per parameter". This discovery highlights how the size, training, architecture, and data quality of these models affect their capacity to store information. Furthermore, it has been found that language models can autonomously identify and prioritize domains rich in knowledge, optimizing their storage capacity.