Grass Network, the data layer of AI on Solana, has open-sourced a dataset containing 600 million top Reddit posts and comments from 2024. This dataset, named UpvoteWeb-24-600M, includes media links and reply lineage, and has been anonymized to preserve user privacy. The data, gathered by 2 million nodes globally in just one week, aims to make AI training more accessible for developers, leveling the playing field with centralized model training sets. This marks a significant milestone for the Grass ecosystem and the broader AI community.
🚨JUST IN: Grass Network (@getgrass_io) the data layer of AI on solana has open-sourced 600 million top Reddit posts and comments from 2024. This release aims to make AI training more accessible for developers. https://t.co/Y18NaLuOZv
600 million Reddit posts and comments - the largest 2024 reddit public data set- gathered in just one week.... That's one small step for the Grass network, one giant leap for open source AI. https://t.co/c869ive0EH
Grass just open sourced 600,000,000 Reddit posts and comments 🤯🌱 The data can be used for AI training and helps level the playing field with centralized model training sets 2 million nodes globally scraped this data This is a historic moment for @getgrass_io and AI https://t.co/EAADGrfx5S