ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection
Tao Liu, Taiqiang Wu, Runming Yang +3 more
Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core ex...