Alibaba has released Qwen3, a new family of open-source large language models (LLMs) featuring a Mixture-of-Experts (MoE) architecture. The models range in size from 0.6 billion to 32 billion parameters and are designed for advanced reasoning, coding, instruction following, and multilingual tasks. Qwen3 demonstrates efficient reasoning capabilities with low latency, exemplified by the Qwen3-32B model running on Cerebras hardware, which achieves 1.2 seconds reasoning latency and processes over 2,400 tokens per second. The Qwen3 models have quickly gained traction, with derivative models exceeding 100,000 and integration into platforms such as Clarifai and Hugging Face. The release marks a notable advancement in open-source AI, positioning Qwen3 as a competitive alternative to existing models like GPT-4o and Claude.
Cerebras Systems blazes a trail for AI inference, powering advanced reasoning in real time https://t.co/fP23BR94tI
Cerebras Launches Qwen3-32B: Real-Time Reasoning with One of the World’s Most Powerful Open Models https://t.co/dAlqaCPj0q
You can now run Qwen3-32B on @HuggingFace with Cerebras Inference — and it’s ⚡️! Typing the question took longer than getting the answer 😅 https://t.co/MoJlHV5rDA