
Former Tesla AI Director Andrej Karpathy has reproduced GPT-2 in 24 hours for $672, showcasing significant cost reduction compared to GPT-4's $100 million training expense. The process involved using an 8xH100 GPU node and highlights the advancements in AI hardware and software.
🤯 The pace of AI progress is insane! Training your own GPT-2 is now cheaper than a decent gaming PC. 🤑 https://t.co/6sY7VTYMvA
Train GPT-2 for $672 on an 8xH100 GPU node in 24 hours. 🔥 https://t.co/IbXlF0zhea
400 billion parameters 6.4 trillion bits (half precision) 800GB A H100 has 80GB A H100 node has 8H100s, so it needs 2 nodes $140k monthly bill🫢💸 $70k with 8-bit quantization I hope people do good distillation or extreme non-degrading quantization (or that I did my math wrong)
