On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models
Shumin Wang, Yuexiang Xie, Wenhao Zhang +4 more
Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While recent studies increasingly focus on monitoring and adjusting entropy to better balance exploration and exploit...