Sources
- machine learning
Vibe Checks: Precision in LLM Evaluations List of BEST LLMs and Their Vibes VibeCheck Prompt for LLMs https://t.co/NZHPl9eJoE https://t.co/jHDmIfB5Mq
- Ethan Mollick
I love that the idea of vibe-based checks has now spread officially to both benchmarking & the labs themselves. (But they are right, because "vibes" are actually complex heuristic judgements made by humans that they have trouble explaining, but which are often surprisingly good) https://t.co/MAWOui7hS4 https://t.co/dl8WXDx8Pn
- Eugene Yan
💯 while vibe checks may not scale as well, they help us understand how we do on fuzzier tasks know when to use which: use assertion/llm-based evals as scalable checks (for regressions); use vibe evals as you start getting to the frontier fuzzy vs crisp: https://t.co/BQmLjjDiHS… https://t.co/FCqvMAdyD8 https://t.co/eezoFqlRbj