Recent research has unveiled significant advancements in the understanding and evaluation of Large Language Models (LLMs). A new theoretical framework models LLMs as finite-state Markov chains, providing insights into their probabilistic and memorization-influenced reasoning capabilities, including Chain of Thought and next-token-prediction in solving shift-cipher problems. Additionally, CodeMMLU, a comprehensive multiple-choice question-answering benchmark, has been introduced to evaluate code understanding in LLMs, covering over 10,000 questions across diverse domains and programming languages. This benchmark reveals limitations in state-of-the-art models' code comprehension. Furthermore, TurtleBench offers a dynamic evaluation approach, focusing on reasoning over knowledge recall, addressing the shortcomings of existing static datasets.
Very interesting paper. LLMs w/chain-of-thought prompting exhibit a mix of noisy reasoning, memorization, and probability (next-token-prediction) in solving shift-cipher problems. Next step should be to see if similar effects holds on other reasoning tasks. https://t.co/Od5djEuAO5
LLMs as Markov chains - LLMs can be modeled as finite-state Markov chains despite their seemingly infinite generation capacity - Stationary distribution captures LLM's understanding of natural language in its token space ------ Generated this podcast with Google's illuminate. https://t.co/wg223XWZWF https://t.co/v2E8hs0UrA
Fine tuning LLMs for Entity Matching Using structured explanations explicitly mentioning attributes, importance, and similarity to augment training data for improved LLM fine-tuning in entity matching. ------ Generated this podcast with Google's illuminate. https://t.co/vtql3rR3T0 https://t.co/c4TFhNwTiN