Bandit algorithm

Concept

A type of problem in reinforcement learning where an agent must choose between multiple options (arms) with unknown reward probabilities, aiming to maximize cumulative reward.

Mentioned in 1 video