OpenAI has utilized more than a million hours of YouTube video transcriptions to train its latest artificial intelligence model, GPT-4, as reported by the New York Times. This extensive dataset was processed using OpenAI's Whisper audio transcription model, enabling the accurate conversion of spoken content into text. The initiative saw significant contributions from key OpenAI personnel, including President Greg Brockman, who was actively involved in the video collection process. Google also transcribed YouTube videos for text harvesting, highlighting a broader industry trend of leveraging publicly available data to enhance AI capabilities.
“OpenAI transcribed over a million hours of YouTube videos to train GPT-4” Article: https://t.co/SI1gG9wWhv
OpenAI transcribed over a million hours of YouTube videos to train GPT-4 #OpenAI #GPT4 #AI #TechAI https://t.co/TlZZOmSuPg
🚨OpenAI reportedly used transcriptions of over a million hours of YouTube videos to train GPT-4. Open AI used Whisper audio transcription model to assist in this process, which allowed them to transcribe the YouTube content What do you think they have used for GPT-5?? https://t.co/hsccRD5gF4