Apple has provided early access to two maxed-out M3 Ultra 512GB Mac Studios for testing ahead of their public release. Users are utilizing these machines to run the DeepSeek R1 model, which features 671 billion parameters and operates at 8 bits. The setup includes a Thunderbolt 5 interconnect capable of 80 Gbps, allowing for efficient processing. Initial tests indicate that the system can run at speeds of approximately 11 to 18.43 tokens per second, with a theoretical maximum of around 20 tokens per second. The DeepSeek R1 model has gained recognition as the leading open-source large language model (LLM) six weeks post-launch, with various users showcasing its capabilities, including real-time animation generation using p5.js.
Six weeks post-drop, DeepSeek R1 holds strong as the top open-source LLM 🐋 Check out this video on deploying it fast with Modal’s serverless GPU infra (4x L40S GPUs) — no hardware headaches. 🚀 https://t.co/OaYAMXCy4Y
🔥 DeepSeek R1 671B Q4 - M3 Ultra 512GB with MLX🔥 - 18.43 tokens/sec - Generates a p5js zero-shot, tested at video's end 😱 - Video in real-time, no acceleration! - First test, and I'm blown away! Prompt: "Create an amazing animation using p5js" https://t.co/whiomgtXzv
Running DeepSeek R1 on my desk Uses @exolabs with Thunderbolt 5 interconnect (80Gbps) to run the full (671B, 8-bit) DeepSeek R1 distributed across 2 M3 Ultra 512GB Mac Studios (1TB total Unified Memory). Runs at 11 tok/sec. Theoretical max is ~20 tok/sec. https://t.co/ijD7YamHLl