
AI pioneer Sepp Hochreiter has introduced a new architecture called xLSTM, or Extended Long Short-Term Memory, which aims to address the limitations of traditional LSTMs and compete with state-of-the-art language models like Transformers. The xLSTM incorporates innovations such as exponential gating, modified memory structures, and introduces sLSTM and mLSTM memory cells, enhancing its performance and scalability. This development is part of a broader effort to advance European language model capabilities under the NXAI initiative. The new model, which includes a parallelizable LSTM, has generated significant interest in the AI community, with discussions around its potential to outperform existing models. However, no code or weights have been shared yet.
Currently xLSTM is 4x slower than FlashAttention and Mamba, but if this is figured out with better cuda kernels, we would have a model linear in seq_len that is as strong and fast as transformers!!! https://t.co/yldCqppRX9
"I'll be back" LSTM xLSTM: Extended Long Short-Term Memory https://t.co/WXRbkYbWSO
Thanks @srush_nlp for this compelling collection of recent RNN-based Language Models! I think now you have to update this list with the #xLSTM 😉 I agree, naming conventions are always hard... In our paper we try to stick to the original LSTM formulation from the 1990s: https://t.co/Xe6R32pNsO https://t.co/prFJA7kPvp
