CogVideoX-5B, a new text-to-video model from the Tsinghua-based ChatGLM group, has been released. This model, part of the GLM LLM series, is notable for its high-quality video generation capabilities and can operate efficiently with less than 10GB of VRAM, making it accessible for more users. The model is integrated with diffusers and is competitive with other models like Runway, Luma, and Pika. Additionally, the smaller CogVideoX-2B model has been updated to support the Apache 2.0 license. The release of open weights for the 5B model marks a significant advancement in the field, offering a cost-effective solution for video generation.
📢🔥Hot New Release: CogVideoX-5B, a new text-to-video model from @thukeg group (the group behind GLM LLM series) - More examples from the 5B model in this thread👇 - GPU vram requirement on Diffusers: 20.7GB for BF16 and 11.4GB for INT8 - Inference for 50 steps on BF16: 90s on… https://t.co/GAyWmst5GW
CogVideoX just released the weights for its 5B model! 🎥 ✨ It's the best open weights text-to-video model - competitive with Runway / Luma / Pika. With 🧨@diffuserslib, it fits on < 10GB VRAM 🤏 (ah, and they changed the smaller 2B model license to Apache 2.0 🔥) https://t.co/5fxAk6BuLv
The best open weights video generation model is here - CogVideoX 5B 🔥 It comes with 🧨 Diffusers integration. Proud to share my major dish cooked at @huggingface in collaboration w/ the @ChatGLM folks! Model details, mem-efficient inference (33GB -> 8GB), and more are in 🧵 https://t.co/SgKJAK1LGi