Bandit algorithm

ConceptMentioned in 1 video

A type of problem in reinforcement learning where an agent must choose between multiple options (arms) with unknown reward probabilities, aiming to maximize cumulative reward.