May 16, 07:54 PM

New Platform and 200TB CommonCanvas Dataset Launched for Training AI Models with 70M Creative Commons Images

A new platform has been launched to curate, enrich, and download non-infringing data for model training, starting with 200TB of high-quality public domain image data. This dataset, which is EU compliant, is designed to train state-of-the-art (SoTA) base models. In related news, Spawning has curated a 200TB dataset of public domain and Creative Commons Zero (CC0) images, which is larger than the LAION dataset, for training ethically sourced AI models. The dataset is artist-friendly and IP-aware. Additionally, the CommonCanvas dataset, featuring open diffusion models trained on Creative Commons-licensed images, has been released. This dataset includes approximately 70 million openly licensed Creative Commons images and synthetic captions, making it the largest of its kind. The CommonCanvas models are based on SD2 and SDXL architectures. This release is also highlighted in a CVPR2024 paper, with significant contributions from SkyLi0n.

#EU #Spawning #Creative Commons Zero #LAION #CommonCanvas #Creative Commons #SD2 #SDXL

Written with ChatGPT (GPT-4o).

Sources

Additional media

Image #1 for story new-platform-200tb-commoncanvas-dataset-launched-training-ai

Image #2 for story new-platform-200tb-commoncanvas-dataset-launched-training-ai

Image #3 for story new-platform-200tb-commoncanvas-dataset-launched-training-ai

New Platform and 200TB CommonCanvas Dataset Launched for Training AI Models with 70M Creative Commons Images

Sources

Additional media

Similar Stories