May 16, 07:42 AM

Microsoft and Salesforce Study Finds Top 15 LLMs Drop Accuracy From 90% to 60% in Multi-Turn Conversations

A new study conducted by Microsoft and Salesforce has revealed that large language models (LLMs) experience a marked decline in accuracy during multi-turn conversations. The research found that the top 15 LLMs tested showed a 39% drop in performance when handling multi-turn prompts compared to single-turn tasks. Accuracy fell from approximately 90% in single-turn scenarios to around 60% in extended back-and-forth interactions. The study highlights that LLMs tend to make premature assumptions and struggle to maintain context over multiple conversational turns, leading to increased unreliability and hallucination issues. Experts note that while LLMs excel when given all necessary information upfront, their performance degrades as instructions unfold in longer dialogues. The findings suggest that current LLMs are better suited for single-turn tasks and that the future of AI research may move beyond LLM-centric models to address these limitations.

#Microsoft #Salesforce

Written with ChatGPT (GPT-4).