G
Group Relative Policy Optimization (GRPO)
ConceptMentioned in 1 video
A novel training method developed by DeepSeek to improve model efficiency and accuracy, also used in DeepSeek R1.
A novel training method developed by DeepSeek to improve model efficiency and accuracy, also used in DeepSeek R1.