I agree that Grok as well as a lot of large language models are just a more sophisticated form of "garbage in, garbage out." If AI is trained on good data, you can get a good answer, but AI can be very slanted in non- quantifiable fields such as sociology and economics. So below… https://t.co/ccpsbL0OKf
Did xAI lie about Grok 3’s benchmarks? The article discusses the controversy between xAI and OpenAI regarding the benchmarks of xAI's latest AI model, Grok 3. An OpenAI employee accused xAI of misleading benchmark results, while xAI defends its reporting.
Did xAI lie about Grok 3’s benchmarks?: https://t.co/CX1lW8oQml by TechCrunch #infosec #cybersecurity #technology #news
The performance of xAI's Grok 3 model has come under scrutiny following a comparison with OpenAI's o3-mini model. According to LiveBench scores, Grok 3 achieved an overall average score of 71.57%, with a coding task score of 67.38%. In contrast, OpenAI's o3-mini, dated January 31, 2025, scored 82.74% overall and 69.69% in coding tasks. This discrepancy has led to allegations of misleading benchmark reporting by xAI, with an OpenAI employee suggesting that xAI may have manipulated results. The controversy has sparked discussions about the reliability of AI models and their training data, with critics highlighting the potential for bias in AI outputs due to the sources of their training material.