Recent analyses of OpenAI's Sora model, as estimated by Factorial Funds, reveal significant computational requirements and costs associated with its deployment and operation. For large-scale deployment, an estimated 720,000 Nvidia H100 GPUs are needed. The model, which is trained on 12 trillion tokens and consists of 132 billion parameters, requires between 4,200 to 10,500 H100 GPUs for a one-month training period. For inference, generating a one-minute video takes approximately 12 minutes on a single H100 GPU, with the model capable of producing about 5 minutes of video per hour. This high computational demand implies that the cost of using Sora, potentially reaching $500k for a run with H100 fp8, will be prohibitive for average content creators, positioning it as a tool primarily for large studios or creators. The model's training on such a vast dataset marks a significant advancement over its predecessors, with GPT-4 trained on 10 trillion tokens and Llama2 on 2 trillion.
What a nice model. 12 trillion tokens is around optimal for a model of that size and as they note dataset makes a huge difference. Note the actual compute for main run is just a bit more than the 2m A100 hrs of LLaMA 70b With H100 fp8 a run like this could be $500k. https://t.co/LhRHLK880y
12T tokens and 132B parameters but mixtral sitting at 45B parameters is still very good... I wonder how many tokens they used for mixtral https://t.co/5TO1gbQnrq
12T tokens is major - wow! For comparison, GPT-4 is widely understood to have been trained on 10T Llama2 was 2T https://t.co/zRQHR7wjlB