Recent discussions among experts highlight advancements and challenges in large language models (LLMs) regarding reasoning capabilities. A new paper indicates that LLMs can perform latent multi-hop reasoning without shortcuts, achieving over 80% success depending on the type of bridge entity used. However, the QwQ-32B-Preview model from Alibaba has been noted to struggle with general-purpose reasoning, similar to DeepSeek's recent model. While it excels in specific tasks via API, it does not compete effectively with leading models like 4o or o1. Additionally, recursive reasoning loops are common in low-bit quant models, and there is anticipation for smaller reasoners trained on the Apache 2.0 licensed QwQ model to be released soon. The competitive landscape for reasoning LLMs is intensifying, with the QwQ-32B-Preview showing promise in solving complex math problems.
🔥 Battle for the top reasoning LLM intensifies! The QwQ-32B-Preview is a very good reasoning LLM. Full video of my tests here: https://t.co/grfpjJvPZF Summary of my findings and thoughts: It was able to solve a couple of hard math problems so it looks very promising for… https://t.co/0UBhwYobYm
Initial thoughts about QwQ 1. Recursive reasoning loops are indeed common, especially on low-bit quants (shoutout @bartowski1182 as always). 2. Reasoning traces on agentic tasks are👌 it's Apache 2.0 licensed so I expect smaller reasoners trained on this to be released soon 3.…
The new QwQ-32B-Preview model from Alibaba has the same problem as DeepSeek's recent reasoning model. As a model for solving specific tasks over an API (like RAG or decision-making based on provided facts), it's good, but not as a general-purpose model competing with 4o or o1. https://t.co/QwBHllCUJO