Several significant developments have occurred in the field of mathematical reasoning improvement through the release of large datasets and models. A 200GB dataset of mathematical texts has been open-sourced, offering potential for enhancing Language Model Models (LLMs) without relying on human-annotated data. Additionally, a math instruction tuning dataset, OpenMathInstruct-1, containing 1.8 million problem-solution pairs, has been made available under a commercially permissive license. Another notable release is the BGGPT-7B-Instruct-v0.1, the first free and open Bulgarian LLM, outperforming previous models on Bulgarian tasks.
OpenMath Instruct-1 by @NVIDIAAI 🧮 > 1.8 Million Problem-Solution (synthetic) pairs. > Uses GSM8K & MATH training subsets. > Uses Mixtral 8x7B to produce the pairs. > Leverages both text reasoning + code interpreter during generation. > Released LLama, CodeLlama, Mistral,… https://t.co/udLusxUdLy
Thrilled to launch BGGPT-7B-Instruct-v0.1: 1st free and open Bulgarian LLM and 1st in our BgGPT series. Outperforms Llama-7B and Mistral-7B on all BG tasks. Оften outperforms Mixtral-8x7B on BG, a much larger model (similar in performance to GPT-3.5). https://t.co/3MxrcTM8h2
Check out this breakdown of the power of #KnowledgeGraphs by @tb_tomaz! They cover everything from structured to unstructured data. An innovative approach worth exploring! #LLMS #Neo4j https://t.co/ldwcUMzczP https://t.co/7o7IbuTrvJ