Group Relative Policy Optimization (GRPO)

Concept

A novel training method developed by DeepSeek to improve model efficiency and accuracy, also used in DeepSeek R1.

Mentioned in 1 video