model weights and dataset released for commoncanvas! 🤠🤠 diffusion models trained on fully creative commons images weeeeee https://t.co/4cNdPecMii
Quite excited that CommonCanvas is JUST out! 🖼️ • First open source text-to-image models trained fully on openly licensed images (SD2 and SDXL architectures) • The dataset, with ~70M openly licensed creative commons images + synthetic captions, is also out (largest dataset on… https://t.co/zhX7vrEO0h
CommonCanvas is here! A new dataset for training text and image models, and a family of models to go with it. All Creative Commons licensed. Amazing work by @SkyLi0n!!! https://t.co/231DfKmkf4




A new platform has been launched to curate, enrich, and download non-infringing data for model training, starting with 200TB of high-quality public domain image data. This dataset, which is EU compliant, is designed to train state-of-the-art (SoTA) base models. In related news, Spawning has curated a 200TB dataset of public domain and Creative Commons Zero (CC0) images, which is larger than the LAION dataset, for training ethically sourced AI models. The dataset is artist-friendly and IP-aware. Additionally, the CommonCanvas dataset, featuring open diffusion models trained on Creative Commons-licensed images, has been released. This dataset includes approximately 70 million openly licensed Creative Commons images and synthetic captions, making it the largest of its kind. The CommonCanvas models are based on SD2 and SDXL architectures. This release is also highlighted in a CVPR2024 paper, with significant contributions from SkyLi0n.