
Google DeepMind has introduced AtP*, an improved variant of Attribution Patching (AtP), promising to significantly enhance the efficiency and effectiveness of identifying key nodes in Large Language Models (LLMs). In parallel, a collaborative effort involving researchers from Hazy Research has led to the development of Based, an innovative architecture that combines size-64 short sliding window attention and softmax-approximating linear attention. This new model, Based, is reported to achieve 24 times higher throughput than traditional Transformer models, marking a significant advancement in the field of artificial intelligence and machine learning.
BASED: Simple linear attention language models balance the recall-throughput tradeoff https://t.co/hu2QqFdo5F https://t.co/Xyki3OYAgB
Stoked to be sharing Based! We find that the simple combo of linear and sliding window attention can enable 24x higher throughput than Transformers. Had a ton of fun diving deep on the tradeoffs that govern these recurrent models! https://t.co/WPNajkwZ7M https://t.co/kNRG3s3As3 https://t.co/5Bi0kPhwKA https://t.co/w0qwplt6Qc
Excited to share new research we collaborated with @HazyResearch on — Based, a new architecture that leverages attention-like primitives – short (size-64) sliding window attention and softmax-approximating linear attention. https://t.co/Z1o6cbJAPT https://t.co/0KE8uuziD4






