
H2O.ai has released its latest series of small language models, H2O-Danube3, under the Apache 2.0 license. The series includes two models: H2O-Danube3-4B, which is trained on 6 trillion tokens, and H2O-Danube3-500M, trained on 4 trillion tokens. These models were pre-trained on high-quality web data in three stages and utilize the Llama architecture with a Mistral tokenizer (32K vocabulary). The models feature an 8192 context length and Grouped Query Attention. H2O-Danube3 models show competitive performance on various benchmarks, including academic and chat tasks, and outperform Qwen 2 0.5B while being competitive with Phi3 4B.
We've just release 🤏 SmolLM, a series of 3 Models - 135M, 360M, and 1.7B parameters - SOTA for their size, here is some insight on the training 🧵 https://t.co/Apfg1UQBo3
https://t.co/UnILPb6qSK Just Released Its Latest Open-Weight Small Language Model, H2O-Danube3, Under Apache v2.0 https://t.co/eiLFjUARZS #models #model #language #devices #applications #tasks #efficiency #performance #h2odanube34b #h2odanube3500m
Releasing H2O-Danube3, a series of small language models with a 500M and 4B version. Shows competitive performance on a series of benchmarks including academic, chat, as well as fine-tuning. Check them out on our HF page: https://t.co/InRTC9rcdW




