G

Group Relative Policy Optimization (GRPO)

ConceptMentioned in 1 video

A novel training method developed by DeepSeek to improve model efficiency and accuracy, also used in DeepSeek R1.