G
GRPO
Tool / ProductGroup Relative Policy Optimization; a cheap, scalable training method that compares multiple student-generated answers to select the best one instead of grading every sentence by a separate teacher model.
Mentioned in 1 video
Group Relative Policy Optimization; a cheap, scalable training method that compares multiple student-generated answers to select the best one instead of grading every sentence by a separate teacher model.
Mentioned in 1 video