🏷️:Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment 🔗:https://t.co/RTvBESZfeD https://t.co/R1dfJqZxqa
[CV] Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens L Fan, T Li, S Qin, Y Li... [Google DeepMind & MIT] (2024) https://t.co/Jgg0pQaeEM https://t.co/HpwArBPew7
🏷️:Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens 🔗:https://t.co/tGRMeoHyyN https://t.co/6zgHIjRso4
NVIDIA, MIT, and Tsinghua have introduced SANA, a powerful text-to-image generator capable of creating high-resolution images up to 4096x4096 pixels in seconds. This innovative tool leverages Linear Diffusion Transformers, enabling efficient training and deployment on laptops. Additionally, the AI community is exploring various methods to enhance text-to-image models, including improving long-text alignment, optimizing models across compute budgets, and scaling autoregressive generative models with continuous tokens. Notable contributions in this field include research from Google DeepMind and MIT on scaling autoregressive models.