We might earn a commission if you make a purchase through one of the links. The McClatchy Commerce Content team, which is independent from our newsroom, oversees this content. This article has ...
The paper revisits the prevailing narrative that "SFT memorizes, RL generalizes", mainly focusing on reasoning SFT (with long-CoT supervision). Our core conclusion is that generalization in reasoning ...