Researchers at Apple have introduced a new machine learning method called Cut Cross-Entropy (CCE), which computes the cross-entropy loss without the need to materialize logits for all tokens into global memory. This innovation significantly reduces the memory footprint of loss computation, decreasing it from 24GB to just 1MB for the Gemma 2 2B model. The method aims to enhance the efficiency of large-vocabulary language models, addressing the challenges posed by traditional loss computation methods. Additionally, a separate study titled 'A Case for Soft Loss Functions' by Uma et al. highlights the advantages of using soft labels for training in computer vision and other AI tasks through crowd annotations.
Apple Researchers Propose Cut Cross-Entropy (CCE): A Machine Learning Method that Computes the Cross-Entropy Loss without Materializing the Logits for all Tokens into Global Memory https://t.co/ONaSs6zB1f #CutCrossEntropy #LargeLanguageModels #MachineLearning #NLPInnovation #… https://t.co/T3MHSyD46b
Apple Researchers Propose Cut Cross-Entropy (CCE): A Machine Learning Method that Computes the Cross-Entropy Loss without Materializing the Logits for all Tokens into Global Memory Researchers at Apple introduced the Cut Cross-Entropy (CCE) method, a novel approach designed to… https://t.co/4v9eIJ2zKp
Apple presents Cut Your Losses in Large-Vocabulary Language Models https://t.co/FhAk4zeOxm