The GPT-2 (124M) language model, released by OpenAI in 2019, can now be reproduced in significantly less time and cost using llm.c and H100 GPUs. Training times have been reduced to as low as 27 minutes, with costs under $10, marking a substantial decrease in both time and expenses compared to the original training process.
Another day has passed, and I managed to train GPT-2 (124M) using @karpathy's llm.c in just 27 minutes with 8 x H100 GPUs for under $10. All you need is to adjust the learning rate (LR). The original maximum learning rate after warmup in the repo was set to 0.0006 (following the… https://t.co/caTeMZseQf https://t.co/7pe0wnhS5d
Training GPT-2 in even less time (50 minutes) with 8 H100s for even less money means a 3k-fold cost reduction in about 5 years. The original GPT-2 was trained (in 2019) for several weeks, surely not the net value, but think about it: now it takes less than an hour with less… https://t.co/gCgwMpk4Mo
I trained GPT-2 (124M) using @karpathy's llm.c in just 43 minutes with 8 x H100 GPUs. This is 2.1x faster than the 90 minutes it took with 8 x A100 GPUs. Currently, the cost of renting an H100 GPU is around $2.50/hr (under 1-year commitment), which reduces the training cost for… https://t.co/NOK7poiozk https://t.co/DASUg5czxj