The Enigma Project team, led by Konstantin Wille, has set new records in NanoGPT training speed using 8xNVIDIA H100 GPUs. The latest record achieved a FineWeb validation loss of 3.28 in 2.979 minutes, surpassing the previous 2.990-minute record by 0.7 seconds. This improvement was enabled by overlapping gradient communication with computation. Earlier, the team had improved the speed from 3.014 minutes to 2.990 minutes through accelerated gradient all-reduce techniques. Concurrently, researchers including Michal Takac have reported substantial algorithmic optimizations—achieving 70-80% improvements on single RTX4090 GPUs—by integrating PhysicsML and SciML concepts into large language models without altering hyperparameters or CUDA kernel configurations. These advancements highlight ongoing progress in thermodynamic computing and transformer model efficiency, with a focus on scaling NanoGPT training performance.
New NanoGPT training speed record: 3.28 FineWeb val loss in 2.979 minutes on 8xH100 Previous record: 2.990 minutes (0.7s slower) Changelog: Overlapped gradient communication with computation New record-holder: @ryanyang0 https://t.co/nVBhZ807JF
Another win in NanoGPT2 speedrun on RTX4090, combining some stuff we used at @TheDimensionLab in @siml_ai model architectures (for PhysicsML) with newer ideas for optimising bottlenecks in Transformers. I think this one was already below 3.3821 val_loss earlier than currently https://t.co/SShERfLSPs
🚀 Congrats to @KonstantinWille and the Enigma Project team https://t.co/8HIkmjKMV4 for being NanoGPT speed‑run world record-champions for a glorious day 🏆🔥 https://t.co/IcxIO5OGNn