
A series of advancements in large language models (LLMs) and multimodal AI technologies were announced, highlighting new benchmarks and tools aimed at enhancing model efficiency and performance. The MMMU benchmark was introduced to challenge AI models to understand and reason using text and images from college-level exams. Innovations such as XTuner promise to cut DPO training time in half and enable the training of Llama3 70B RM models with sequence lengths up to 1 million tokens on 64 A100 GPUs. Additionally, LightOn's research explored hardware-agnostic training of LLMs on AMD GPUs, while the EM-LLM architecture integrates aspects of human memory into transformer-based models. New tools like LlaMa-Factory and SmoLlm facilitate fine-tuning and deployment of LLMs, and various benchmarks, including MMLongBench-Doc, focus on evaluating long-context document understanding. Furthermore, the integration of WebAssembly as a portable runtime for LLMs addresses cross-platform challenges in cloud-native applications.









🚨ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities [ECCV'24] 🌟𝐏𝐫𝐨𝐣: https://t.co/61RTwLKVOt 🚀𝐀𝐛𝐬: https://t.co/3FHzh2QaCi propose a new task: 3D Reasoning Grounding task, focusing the reasoning grounding in 3D scene. Given a 3D scene and an… https://t.co/KJyXXZU69K
🚨DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding [ECCV'24] 🌟𝐏𝐫𝐨𝐣: https://t.co/n7hSf7FUW5 🚀𝐀𝐛𝐬: https://t.co/ynDh3p1k2u a new method that considers the joint generation of different 3D models from the same text prompt. https://t.co/iDm2id8AWX
🚨TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models [ECCV'24] 🌟𝐏𝐫𝐨𝐣: https://t.co/feXxonZlAF 🚀𝐀𝐛𝐬: https://t.co/MrLTqRC9z5 Analyse biases for any prompt, with any black-box T2I model, using TIBET! https://t.co/7gqUBIAe2d