Aug 29, 02:34 PM

Gemini 1.5 Pro 0827 Scores 67% on Aider's Benchmark, Behind 77% Sonnet and 66% Llama 405b

Recent evaluations of AI models have highlighted the performance of Gemini 1.5 Pro 0827, which achieved a score of 67% on the Aider's code editing benchmark. This places it just slightly above Llama 405b, which scored 66%. In comparison, the Sonnet model led with a score of 77%, while GPT 3.5 Turbo 0301 and Gemini 1.5 Flash 0827 followed with scores of 58% and 53%, respectively. Additionally, Gemini has released another fine-tuned version of its model, which reportedly has only made minor improvements. In a separate assessment of structured output capabilities, Gemini 1.5 was rated as 'OK', while OpenAI's GPT-4o was recognized as the best model due to its direct Pydantic integration. Claude 3.5 was rated second, requiring a 'tool call' trick for optimal performance. Gemini 1.5 Flash was noted for outperforming GPT-4o-mini in most categories, except for coding tasks.

#Gemini #Aider #Llama #Sonnet #GPT #OpenAI #Pydantic #Claude

Written with ChatGPT (GPT-4o mini).

Sources

Additional media

Image #1 for story gemini-1-5-pro-0827-scores-67-on-aider-s-benchmark-behind-77-sonnet-66-llama

Image #2 for story gemini-1-5-pro-0827-scores-67-on-aider-s-benchmark-behind-77-sonnet-66-llama

Gemini 1.5 Pro 0827 Scores 67% on Aider's Benchmark, Behind 77% Sonnet and 66% Llama 405b

Sources

Additional media

Similar Stories