Recent discussions highlight the evolving capabilities and limitations of large language models (LLMs). A new benchmark indicates that LLMs perform 15% worse on complex graph-based workflows compared to simpler linear tasks. This gap in planning capabilities suggests that while LLMs excel in straightforward tasks, they struggle with intricate flowcharts. Experts emphasize that LLMs address the significant demand for niche software, providing a cost-effective alternative for software development. However, misconceptions persist among users, who often equate LLMs with human-like intelligence. Despite LLMs achieving impressive results, such as passing graduate-level tests, they have yet to demonstrate superior performance in coherent intellectual jobs compared to top human professionals.
Most laymen (many of whom have become AI experts) think that LLMs are complex systems like our brain. They saw the difference between GPT-3.5 and 4 and extrapolated this like if baby Einstein would inevitably become Einstein the scientist if you invest enough into his education.…
LLMs have been quickly climbing the scale of human ability, passing tests reserved for graduate students, exceeding people on narrow tasks Yet there is no coherent intellectual job where LLMs exceed the top humans. I see many people assume this will change, for which jobs? When? https://t.co/JrQCZF6YYM
The main problem LLMs solve is there is a vastly greater need for niche software than there are software engineers to write it. LLMs not only make it cheaper to create new software, they can also substitute for dedicated software in a pinch. That’s potentially a very big deal.