Today local models took a hit https://t.co/uPQYkw8fQk
So gpt-4o mini was absolutely dominating gpt-4o on our internal llm-as-judge evals. So I LOOKED AT THE DATA (h/t @HamelHusain) and realized it was answering the question BUT ALSO throwing in a bunch of random extra information that the prompt in no way asked for, and the judge…
Robotics won’t have a ChatGPT Moment https://t.co/wnImo55r8F


OpenAI's ChatGPT-4o emerged as the best performing model in a study conducted on five AI language models, achieving a score of 98%. It provided detailed medical analyses, employing language reminiscent of a medical professional. The AI not only delivered answers with extensive reasoning but also contextualized its decision-making process. Scott Gottlieb discussed the implications of ChatGPT's performance for the medical field.