Prms
ConceptMentioned in 1 video
Possibly referring to 'Policy-space Reinforcement Methods' or similar, mentioned alongside MCTS as not being useful for the R1 distillation approach.
Possibly referring to 'Policy-space Reinforcement Methods' or similar, mentioned alongside MCTS as not being useful for the R1 distillation approach.