Llama 3.1, a new AI model with 405B parameters, has shown promising benchmark results according to leaked information. The model was pretrained on approximately 15 trillion tokens from publicly available sources and fine-tuned with over 25 million synthetically generated examples. It was trained for 30.84 million GPU hours and fine-tuned with public datasets and 15 million synthetic samples. The leaked results pertain to the base model rather than the instruct-tuned version, which is expected to be released officially later this week. The model card suggests a decent multilingual focus, including Hindi, with a data cut-off in December 2023.
llama3.1 model card leaked. if it's true then: > decent multilingual focus (Hindi available) > 15T+ tokens, Dec '23 cut off > Post training stage - good public instruct data + 25M synthetic instruct if it's already using top public instruct datasets, need to synth all fresh. https://t.co/zZ4qYs9W4K
Llama 3 was trained for 30.84 million gpu hours on 15 trillion tokens and was Fine-tuned with public datasets and 15 million synthetic samples https://t.co/MlSH3bi69m
“Llama 3.1 was pretrained on ~15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 25M synthetically generated examples.” very nice https://t.co/lMjcLQIsF7