Feb 22, 07:40 PM

Convergence AI Introduces LM2 Models; Moonshot AI Launches Moonlight MoE Model with 3B/16B Parameters and 5.7T Tokens

Convergence AI has unveiled its LM2 Large Memory Models, which feature an unprecedented memory capacity aimed at enhancing complex problem-solving and advanced reasoning in artificial intelligence. In a related development, Moonshot AI, in collaboration with UCLA researchers, introduced the Moonlight model, a mixture-of-experts (MoE) architecture with 3 billion activated parameters and a total of 16 billion parameters. This model was trained on 5.7 trillion tokens using the Muon optimizer, which has demonstrated approximately double the computational efficiency compared to the AdamW optimizer. The Moonlight model is expected to push the boundaries of AI capabilities further by advancing performance while utilizing fewer floating-point operations (FLOPs). Additionally, intermediate checkpoints for the model have been released, and further insights regarding the Muon optimizer's scalability and efficiency are anticipated in upcoming research updates.

#Convergence AI #LM2 Large Memory Models #Moonshot AI #UCLA #Moonlight #Muon #AdamW

Written with ChatGPT (GPT-4o mini).