Sources
Artificial AnalysisSambaNova extends its Llama 3.1 405B inference speed lead - achieving 163 output tokens/s, >2X other providers @SambaNovaAI has rolled out a speculative decoding on their 405B endpoint, now delivering speeds ranging up to 200 tokens/s (depending on prompt complexity). As we’ve… https://t.co/Vc1v8iYoDv
SambaNova SystemsReach for the stars with SambaNova Cloud! Unlock fast #AI inference on @AIatMeta's Llama 3.2 1B & 3B with unmatched performance — all running at full precision. Start building today ⤵️
SantiagoThe dominance of the GPU for AI-specific workflows might come to an end sooner rather than later. I'm now running Llama 3.1 8B Instruct at 1127 t/s and LLama 405B at 200 t/s! This is lightning fast! You can't get that speed anywhere else. The GPU was not designed for AI/ML, so… https://t.co/aD4UO2576q
Additional media




