GPT-4, a highly advanced language model developed by OpenAI, has proven to be a formidable force in the field of artificial intelligence. Despite efforts by various entities like Gemini, Gemini 1.5, Mixtral, and Claude, GPT-4 remains unbeaten, with only Claude managing to slightly outperform it. Observations suggest that the industry may be approaching a plateau in terms of AI model capabilities, as even major players like Google and Anthropic have failed to surpass GPT-4. Recent evaluations indicate that the supremacy of GPT-4 is waning, with newer models like 'mistral'-large emerging as strong contenders alongside OpenAI and Anthropic.
Side by Side eval of latest publicly available models on MATH using the exact same setup for all models 3 months apart (Dec 2023 vs March 2024). The supremacy of GPT-4 is over. It was about time (end of training of GPT-4, Summer 2022) :) https://t.co/oSF54HA45k https://t.co/IXiN0xHthN
Interesting how the picture changed in only 3 months (this is an eval of latest publicly available models on MATH using the exact same setup for all models) Now there's 2 labs at the top (OpenAI and Anthropic). `mistral`-large is a close contender and above `claude-3-sonnet`. https://t.co/1GMzfjLYvM https://t.co/zY9NWCfdND
To mark the anniversary of GPT-4's launch, let's all remember that systems, not just base models, matter--GPT-4 has gotten more useful over time with better fine-tuning, tool use, UX, etc. AI policy folks risk "winning the last war" by not taking systems seriously enough.