DeepNewz, mobile.

People-sourced. AI-powered. Unbiased News.

Screenshot of DeepNewz app showing story detail view.

Screenshot of DeepNewz app showing story list view.

Search

For You

Sources

Loading...

Additional media

Loading...

Similar Stories

Footer

AI

AI Fundraising
AI Modeling
AI Products

Business

Automobile
Company Earnings
Economics
Law
Prediction Markets
Real Estate
Rumors
Stocks
VC

Crypto

Airdrops
Blockchains
CBDCs
DeFi
Hacks
Markets
Memecoin
Mining
NFT
Regulation

Culture

Anime
Celebrities
Crime
Education
Movies
Music
Obituary
TV
Video Games

Environment

Climate
Energy
Natural Disasters
Natural Resources
Sustainability

Politics

DOGE
Epstein Files
Executive Orders
Tariffs
US Domestic Policy
US Elections
US Foreign Policy
US Gov Appointments
US Judiciary
US Legislation
War

Science

Archeology
Bio
Health
Math
Pharma
Physics
Space

Sports

Boxing
Chess
Cricket
Golf
Hockey
MLB
NBA
NCAA
NFL
Olympics
PGA
Poker
Racing
Rugby
Soccer
Tennis
UFC

Tech

AR-VR
Fintech
Infosec
IoT
Metaverse
Policy
Robotics
Smart Home
Software
Startups
Wearables

United States

Arizona
Boston
California
Chicago
Colorado
Detroit
Florida
Georgia
Las Vegas
Los Angeles
New Jersey
New Mexico
New York
Ohio
Oregon
Philadelphia
San Francisco
Seattle
Texas
Utah
Washington DC

World

Terms of Service

WhatsApp YouTube X

© 2025 DeepNFTValue, Inc. All rights reserved.

Oct 4, 01:57 PM

Meta AI Advances Coding LLMs with RLEF, Llama 3.1 Outperforms GPT-4 on CodeContests

Meta AI Advances Coding LLMs with RLEF, Llama 3.1 Outperforms GPT-4 on CodeContests

Authors

10

Meta AI has introduced a significant advancement in coding large language models (LLMs) with the development of Reinforcement Learning with Execution Feedback (RLEF). This technique integrates execution feedback at training time to enhance performance at inference time. The approach has been successfully applied to fine-tune Llama 3.1 models, with the 8B model surpassing GPT-4 on DeepMind’s CodeContests and the 70B model achieving state-of-the-art results. Additionally, the method has been validated through extensive evaluations, including on SWE-bench, demonstrating its effectiveness in improving LLMs for code generation tasks. The evaluations were conducted using a cloud-based infrastructure that speeds up evaluations by 30x.

#DeepMind #CodeContests

Written with ChatGPT (GPT-4o).

Alex Wettig @ COLM@_awettig
11 months ago
How to train long-context LMs? (and beat Llama-3.1 🏆) Many takeaways from our new paper! - Focus on diverse & reliable evaluations (not just perplexity) - Find good sources of long data and high-quality short data - ... A 🧵 on how we produced ProLong, a SoTA 8B 512K model https://t.co/xsRDCQpNUE
Nathan Lambert@natolambert
11 months ago
Meta with another solid looking RLHF paper: RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning This is how big labs improve math etc. Funny because I wrote about "RLCF" in April of 2023. We're slowly plodding along in open RLHF. https://t.co/mA0xVf8laa
Graham Neubig@gneubig
11 months ago
I'm really excited both about our new evaluation framework for fast parallelized evaluation of LLMs as agents, and our new results evaluating SOTA LLMs on SWE-bench. Check this post out for both of them. https://t.co/tcTBRzGw9P

Image #1 for story meta-ai-advances-coding-llms-rlef-llama-3-1-outperforms-gpt-4-on-codecontests-8ecf817e

Image #2 for story meta-ai-advances-coding-llms-rlef-llama-3-1-outperforms-gpt-4-on-codecontests-8ecf817e

Image #3 for story meta-ai-advances-coding-llms-rlef-llama-3-1-outperforms-gpt-4-on-codecontests-8ecf817e

Similar Stories

Anthropic’s Claude Opus 4.1 Tops Coding Tests as OpenAI Pushes Open Models

Authors

24

15 days ago

OpenAI’s GPT-5 Impresses With 400k Context Window and Deeper Reasoning

Authors

11

2 days ago

Anthropic Releases Claude Opus 4.1 With 74.5% SWE-Bench Score, Outperforming OpenAI o3 and Gemini 2.5 Pro

Authors

25

15 days ago

NVIDIA’s Llama Nemotron Super 49B v1.5 Scores 64, Tops AI Index, Commercially Available with 26M Lines Training Data

Authors

6

22 days ago

Mistral AI’s Medium 3.1 Model Debuts at No. 2 on Agent Benchmark

Authors

3

7 days ago

DeepSeek Unveils 685-Billion-Parameter V3.1 Model With 128k Context Window

Authors

31

1 day ago

OpenAI’s GPT-5 Excels on MedXpertQA, FrontierMath Tier 4 Problems, and Elimination Game with 4.86 Score

Authors

6

7 days ago

Anthropic’s Claude Opus 4.1 Sweeps LM Arena AI Benchmarks

Authors

4

2 days ago

Grok 4 Beats Gemini 2.5 Pro 3-2, Outperforms GPT-5 on ARC-AGI, Advances to Kaggle AI Chess Finals

Authors

15

13 days ago

Leaked Tests Suggest Google’s Gemini 3.0 Beats GPT-5

Authors

22

9 days ago

AI /ChatGPT Features