ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools GLM-4: - closely rivals GPT-4 on MMLU, MATH, GPQA, etc - gets close to GPT-4 in instruction following and long context tasks hf: https://t.co/Lo8zu5K26w repo: https://t.co/dJQ4YAtsIy abs:… https://t.co/RBJ1l31So9
I am super-excited to share our DCLM project, that we have been working on for one year: We release a 7B LLM achieving 64 MMLU, trained only on 2T tokens. This is better than Llama2 models while Llama3 8B was trained on 6x times more tokens (i.e. 6 times the compute bill). The… https://t.co/yLmWIZhjeh
[LG] New Solutions on LLM Acceleration, Optimization, and Application https://t.co/RrWAwmXJRi - New methods proposed for LLM acceleration and optimization at the algorithm level include Medusa for parallel decoding with multiple heads and tree-based attention, and SnapKV… https://t.co/RK9bVxAxjG






Recent advancements in large language models (LLMs) have led to significant improvements in their capabilities and efficiency. Models like GPT-4o and LLaMA-7B are trained on vast datasets, requiring immense computational resources. However, new approaches such as Consistency Large Language Models (CLLMs) and vLLM are emerging to address these challenges. CLLMs achieve a 3.4X speedup on the Spider dataset with moderate fine-tuning costs using Jacobi decoding, while vLLM, utilizing the PagedAttention algorithm, significantly enhances memory management like the Usain Bolt of LLM libraries. Additionally, innovative methods like eliminating resource-intensive matrix multiplication (MatMul) reduce memory requirements by up to 90%, potentially expanding AI use in consumer devices. Other optimization techniques include Medusa for parallel decoding and SnapKV for efficient key-value storage. The DCLM project has also released a 7B LLM achieving 64 MMLU, trained on 2T tokens, outperforming Llama2 models. Furthermore, GLM-4 closely rivals GPT-4 in various tasks, including MMLU, MATH, and GPQA.