Sources
Maxime LabonnePotentially the biggest paradigm shift in LLMs Two independent studies managed to pre-train 1.58-bit LLMs that match the performance of FP16 models. Need to see how it scales (~30B), but super curious about 1.58-bit Mamba and MoE models. https://t.co/56EepNqIgP https://t.co/xybyVHBgTi https://t.co/QpCrlu4oJu
Siqi Chenso last month msft published a paper showing a 1 bit parameter LLM with minimal performance loss. someone on huggingface just replicated the results today. this is at least a 10x reduction memory footprint and opens up a path for even more gains in training / inference speeds https://t.co/ApHeGZDrFA
Handy AI🚀 #ElonMusk's AI leaps forward with Grok-1.5, boasting superior math skills. 📈 #Databricks debuts its model, setting a new benchmark. 🌐 #AI21Labs introduces Jamba, merging Mamba with Transformer architecture. Read more: https://t.co/rOSYi59xTY https://t.co/Zm2upLfVcr





