Incredibly excited to announce Hawk and Griffin (https://t.co/JuGyQaOJa0), two recurrent language models with 1) finite sized state + fast inference 2) efficient training on device 3) excellent performance: https://t.co/xw0vB4QyZC
I am incredibly proud to be able to put this paper out finally! This paper shows that hybrid linear RNNs (Griffin) combined with local attention (or sliding window attention) can be incredibly efficient at language modeling. https://t.co/Lokwv6OJtM
We present Griffin: A hybrid model mixing a gated linear recurrence with local attention. This combination is extremely effective: it preserves all the efficient benefits of linear RNNs and the expressiveness of transformers. Scaled up to 14B! https://t.co/uptKapilDM https://t.co/4ZQDLhE6Fu
Google DeepMind has introduced two new architectures, Hawk and Griffin, aimed at enhancing the efficiency of language models. These models incorporate a novel gated linear recurrent layer, designed to replace the conventional multi-query attention mechanism. The introduction of Griffin, in particular, highlights a hybrid approach that combines a gated linear recurrence with local attention, promising to maintain the efficiency of linear recurrent neural networks (RNNs) while incorporating the expressiveness of transformers. This development is significant as it addresses the challenges associated with training RNNs and scaling them efficiently for long sequences. The Griffin model has been scaled up to 14 billion parameters, indicating a substantial leap towards more efficient and effective language modeling.