Recent discussions in the field of artificial intelligence highlight advancements in reasoning-enhanced reward models and evaluation methods for large language models (LLMs). A paper titled 'Reinforcing Thinking through Reasoning-Enhanced Reward Models' co-authored by researchers from the University of California, Santa Cruz, The Harker School, and Meta, emphasizes a novel approach to improving LLM performance. Additionally, a new study from Carnegie Mellon University titled 'Predicting the Performance of Black-box LLMs through Self-Queries' has been introduced, focusing on self-query methods for evaluating LLMs. Another innovative method, PRIME, has been proposed as an open-source online reinforcement learning technique that utilizes implicit Process Reward Modelling (PRM) to enhance LLM reasoning capabilities, reportedly improving mathematical reasoning by up to 27%. Furthermore, a creative evaluation idea suggests using LLMs to assess other LLMs, with scores contributing to a page-rank style graph, though concerns about the reliability of the evaluation markers have been raised.
Fun eval idea! LLMs judge an array of LLMs & scores are used to build a page-rank like graph! Nits: - not sure samples have enough signal to strongly differentiate models - risks using "uncorrelated to real world perf" markers (as all llm-judge methods) - where's Qwen? https://t.co/Ekx1EoOOJX
PRIME an open-source online RL method with implicit Process Reward Modelling (PRM) to improve reasoning of LLMs! 👀 PRIME directly learns a Q-function (scoring) that provides rewards for each token; it can be updated online with only the outcome improving math reasoning up to 27%… https://t.co/Ab6IC8L9LL
[LG] Predicting the Performance of Black-box LLMs through Self-Queries D Sam, M Finzi, J. Z Kolter [CMU] (2025) https://t.co/Y82KduNQlq https://t.co/6MW5CmS2UC