After >week of trying to train humans to replicate the performance of LLMs on a cognitively complex task, I've switched to using the humans to adjudicate the LLM response. LLMs are just far more capable at most high-throughput 115 IQ tasks, no matter how smart the human is.
The future is not LLM centric and research is catching on. Not just multi-turn conversation, degradation is unreasonably high even in single turn convos. Framing the core problem of hallucination and poor use of attention tokens at scale as “creative writing” is very https://t.co/SxeMuXjb3e https://t.co/y7RDz6P4M2
The future is not LLM centric and research is catching on. Not just multi-turn conversations, degradation is unreasonably high even in single turn convos. Framing the core problem of hallucination and poor use of attention tokens at scale as “creative writing” is very https://t.co/y7RDz6P4M2
A new study conducted by Microsoft and Salesforce has revealed that large language models (LLMs) experience a marked decline in accuracy during multi-turn conversations. The research found that the top 15 LLMs tested showed a 39% drop in performance when handling multi-turn prompts compared to single-turn tasks. Accuracy fell from approximately 90% in single-turn scenarios to around 60% in extended back-and-forth interactions. The study highlights that LLMs tend to make premature assumptions and struggle to maintain context over multiple conversational turns, leading to increased unreliability and hallucination issues. Experts note that while LLMs excel when given all necessary information upfront, their performance degrades as instructions unfold in longer dialogues. The findings suggest that current LLMs are better suited for single-turn tasks and that the future of AI research may move beyond LLM-centric models to address these limitations.