The recent advancement of large language models (LLMs) has led to an evolution from single-model applications to complex, multi-model ecosystems. Early implementations relied on a solitary LLM executing tasks, but as teams and customer sophistication increased, so did the need… https://t.co/6KoFYAcud3
🏷️:Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages 🔗:https://t.co/fd1pzVcsaY https://t.co/uGmm93u6eK
🏷️:Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models 🔗:https://t.co/fET5WC1J25 https://t.co/l8wJ1T1RDJ
A new large language diffusion model, LLaDA-8B, has been introduced, showcasing advancements in the field of natural language processing. Trained on 2.3 trillion tokens using 0.13 million GPU hours, LLaDA-8B underwent supervised fine-tuning on 4.5 million pairs. It reportedly surpasses the performance of Llama-2 7B across nearly all 15 standard zero and few-shot learning benchmarks. The model is noted for being trained entirely from scratch, achieving competitive results against LLaMA3 8B despite utilizing 7x fewer tokens, specifically 2 trillion tokens. Additionally, LLaDA employs a masked diffusion model approach, which diverges from traditional left-to-right text generation methods. This innovation may enable the model to match or exceed the capabilities of leading autoregressive language models in various tasks, potentially paving the way for new methodologies in large-scale language modeling. The paper also addresses challenges in enhancing reasoning in language models without significantly increasing model size or relying on specialized training data.