Meta AI has announced the release of Llama 3.2, the latest in its series of open-source AI models. The new models include multimodal capabilities, combining text and image processing, and are available in various sizes including 1B, 3B, 11B, and 90B parameters. The models are designed for visual reasoning, image captioning, and visual question answering (VQA) tasks. Llama 3.2 models are optimized for edge and mobile devices, balancing performance and size efficiently. Notably, SambaNova Cloud has been independently verified as the fastest in AI inference, achieving 2470 tokens per second on the 1B model and 1566 tokens per second on the 3B model. Groq Inc. has set a world record, delivering over 3,000 tokens per second on the 1B model. Llama 3.2 outperforms Claude 3 and GPT-4 Mini. These advancements position Llama 3.2 as a competitive option in the AI landscape, particularly for applications requiring efficient on-device computation.
Meta introduced Llama 3.2, extending its Llama model family with two new vision-language models and two smaller text-only models designed for edge devices. We also teamed up with @AIatMeta for a course that will show you how to put these models to use ⬇️ https://t.co/2L6zueIhTV
🚨 Meta’s new Llama 3.2 models are live on the Azure AI Model Catalog! Access Llama 3.2 11B Vision Instruct and 90B Vision Instruct via inferencing through serverless APIs and manage compute. Learn more: https://t.co/7dxvWzOExR #AzureAI
Groq has created a world record in LLM inference API speed by serving Llama 3.2 1B at >3k output tokens/s making it ~25X faster than OpenAI GPT-4o's API and ~110X cheaper. This is a great deal for applications running AI on edge devices or on-device, where compute resources are… https://t.co/JuQ4gxMo10 https://t.co/XJW0oJg1Af