Sep 18, 07:37 AM

Playground v3 with 24B Parameters Achieves SOTA in Image Diffusion Models

Playground v3, a new model for improving text-to-image alignment, has been introduced. This model utilizes a deep-fusion approach combining a DiT image transformer with a pretrained Llama-3-8B text transformer, incorporating image-text joint attention. It is designed to be simpler than MMDIT. Playground v3 also features a 16-channel VAE at 512x512 resolution and is reported to be a better captioner than GPT-4. The model, with a total of 24B parameters, has achieved state-of-the-art (SOTA) status in image diffusion models. The development was led by Suhail.

#Playground #Suhail

Written with ChatGPT (GPT-4o).

Sources

Garry Tan@garrytan
2 years ago
Here's the real story on how Playground v3 got to SOTA on image diffusion models Amazing work @Suhail https://t.co/Q5ImkUm3al https://t.co/Bpsq5Q3A2e
Garry Tan@garrytan
2 years ago
Playground v1 vs Playground v3 Image models get better so quickly https://t.co/uu86YsZjla
Computer Graphics Papers@Animation
2 years ago
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models. https://t.co/sjdFoYrq2y

Additional media

Image #1 for story playground-v3-24b-parameters-achieves-sota-image-diffusion-models-992cea63

Image #2 for story playground-v3-24b-parameters-achieves-sota-image-diffusion-models-992cea63

Image #3 for story playground-v3-24b-parameters-achieves-sota-image-diffusion-models-992cea63

Playground v3 with 24B Parameters Achieves SOTA in Image Diffusion Models

Sources

Additional media

Similar Stories