The University of Hong Kong has unveiled the Dream 7B, a new open-source diffusion reasoning model, which is touted as the highest performing model of its kind to date. This model features a mask diffusion paradigm and was trained on an extensive dataset of 580 billion tokens. It utilizes weight initialization from the autoregressive model Qwen2.5 7B, which enhances its performance capabilities. Users can adjust the number of diffusion timesteps, allowing for a balance between speed and accuracy. The release has garnered attention as recent comparisons indicate that diffusion models are outperforming traditional autoregressive models in terms of efficiency during training.
Dream-7B diffusion language reasoning model. Normal LLMs we know are technically called “Auto Regressive” models which predict next few tokens. Diffusion models can do completions in random order. https://t.co/JWCaC51bP2
Discrete diffusion is winning over AR recently: LLaDA-8B, Dream-7B, UniDisc I could be coping but maybe diffusion isnt dead yet UniDisc did comparison and found diffusion model to perform better than autoregressive, as they were 'less efficient at training time' but 'more https://t.co/XE9mgSP4CJ
Dream 7B - a new open diffusion reasoning model by @HKUniversity and Huawei Noah’s Ark Lab. ▪️Its special features include: • Adopts a mask diffusion paradigm • Is trained on 580B tokens • Uses autoregressive (AR) model (Qwen2.5 7B) weight initialization, as it's more https://t.co/dmpLCsSVHO