Apple and NVIDIA have announced a collaboration to enhance the performance of large language models (LLMs) through a new technique called Recurrent Drafter (ReDrafter). This open-source method has been integrated into NVIDIA's TensorRT-LLM, resulting in a reported 2.7x increase in token generation speed for production-scale models on H100 GPUs. The partnership aims to optimize LLM inference workloads, moving towards more efficient processing methods that go beyond traditional batching. This development marks a significant advancement in generative artificial intelligence, with potential implications for various applications across industries.
Big news from Apple! Their researchers have unveiled the open-source ReDrafter method, supercharging Nvidia GPUs for a whopping 2.7x faster token generation. This could revolutionize model creation for Apple Intelligence. Get the scoop here: https://t.co/Gy9oMFOd8b
Apple-Nvidia collaboration triples speed of AI model production https://t.co/sQoEPMA5Y7 #Apple
Apple researchers say the company's open-source ReDrafter method on Nvidia GPUs led to a 2.7x speed increase in generated tokens per second for greedy encoding (@malcolmowen / AppleInsider) https://t.co/MRsJcxYm3c https://t.co/Qs7XGWzczi https://t.co/ZOzeer2dpR