Nov 11, 09:10 PM

OpenCoder Launches as Open-Access Coding LLM with 960B Tokens Dataset Covering 600+ Languages, Competing with Proprietary Models

OpenCoder has been introduced as a new open-access large language model (LLM) designed for coding tasks. This model is notable for its transparency, as it provides all training data, protocols, and methodologies, enabling reproducible advancements in code-related artificial intelligence. OpenCoder includes a dataset known as RefineCode, which comprises 960 billion tokens and covers over 600 programming languages. This dataset reportedly outperforms StackV2, achieving similar performance with three times fewer tokens. The OpenCoder family features base models of 1.5 billion and 8 billion parameters, positioning it as a competitive alternative to leading proprietary models. The initiative has garnered support from various sectors, emphasizing the importance of open-source solutions in the face of reliance on proprietary models, particularly in national security contexts.

#OpenCoder #RefineCode #StackV2

Written with ChatGPT (GPT-4o mini).