Google DeepMind has announced a major advancement in the field of vision language models (VLMs) by scaling pre-training data to 100 billion examples. This new dataset, named WebLI-100B, represents a tenfold increase over the largest previously reported vision-language datasets. The research emphasizes that while traditional benchmarks may not show significant benefits from such a large scale of noisy, raw web data, the expansive dataset is crucial for developing inclusive multimodal systems. The findings indicate that performance saturates at this scale on many common Western-centric benchmarks, but tasks involving cultural diversity and other complexities may still benefit from the increased data scale.
Google DeepMind has scaled pre-training data for vision language models to 100 billion examples, highlighting the importance of large datasets for building inclusive multimodal systems. https://t.co/YJWSff2ZYO
Let's goo! Google presents Scaling Pre-training to One Hundred Billion Data for Vision Language Models results highlight that while traditional benchmarks may not benefit significantly from scaling noisy, raw web data to 100 billion examples, this data scale is vital for… https://t.co/HH5kH4Wrks
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Google DeepMind develops WebLI-100B, a novel dataset containing 100 billion image-text pairs, representing a tenfold increase over the largest reported vision-langauge datasets. They demonstrate that a… https://t.co/gdayBDZMFM