Apr 6, 08:31 PM

OpenAI and Google Harvest YouTube Data to Train AI, Reports New York Times

OpenAI has utilized more than a million hours of YouTube video transcriptions to train its latest artificial intelligence model, GPT-4, as reported by the New York Times. This extensive dataset was processed using OpenAI's Whisper audio transcription model, enabling the accurate conversion of spoken content into text. The initiative saw significant contributions from key OpenAI personnel, including President Greg Brockman, who was actively involved in the video collection process. Google also transcribed YouTube videos for text harvesting, highlighting a broader industry trend of leveraging publicly available data to enhance AI capabilities.

#OpenAI #GPT #New York Times #Whisper #Greg Brockman #Google #YouTube

Written with ChatGPT (GPT-4).

Sources

Brian Roemmele@BrianRoemmele
2 years ago
“OpenAI transcribed over a million hours of YouTube videos to train GPT-4” Article: https://t.co/SI1gG9wWhv
TechAI@Tech_AI_Tech
2 years ago
OpenAI transcribed over a million hours of YouTube videos to train GPT-4 #OpenAI #GPT4 #AI #TechAI https://t.co/TlZZOmSuPg
AshutoshShrivastava@ai_for_success
2 years ago
🚨OpenAI reportedly used transcriptions of over a million hours of YouTube videos to train GPT-4. Open AI used Whisper audio transcription model to assist in this process, which allowed them to transcribe the YouTube content What do you think they have used for GPT-5?? https://t.co/hsccRD5gF4

OpenAI and Google Harvest YouTube Data to Train AI, Reports New York Times

Sources

Additional media

Similar Stories