Recent discussions among researchers highlight a significant shift in deep reinforcement learning (RL) methodologies. By framing regression tasks as classification problems, a method involving training value functions with categorical cross-entropy, improvements in performance and scalability of deep RL have been reported. This approach, argued to enhance the scalability of deep RL with little to no additional cost, has shown promise across various domains, including offline and online settings, and even in complex applications like robotics. The potential of this method to work robustly across models and tasks has been recognized as a potentially groundbreaking development in the field. However, the implications of this shift on the dynamic programming aspect of RL training and the knowledge within pre-trained models are still under exploration.
As I said earlier, we need to figure out RL at foundation model scale. This work is yet another piece of the missing puzzle. What I still wonder is how dynamic programming RL training affects the knowledge inherent within a pre-trained model? Some thoughts on this soon. https://t.co/qRDoufti5x
Replacing regression with classification in RL improved performance across many domains, offline and online settings, and even scales to generalist settings like robotics! https://t.co/bRU2u2u82E
Framing regression as a classification has been “dark knowledge” for some time. We wanted to shed some light on this phenomenon in deep RL: Framing value-learning as a classification significantly improves performance and scalability in deep RL. But... not all classification… https://t.co/A0vbOexNpq