The policy that yields the highest expected reward for a given task in reinforcement learning.
Computerphile