Bandit algorithm
ConceptMentioned in 1 video
A type of problem in reinforcement learning where an agent must choose between multiple options (arms) with unknown reward probabilities, aiming to maximize cumulative reward.
A type of problem in reinforcement learning where an agent must choose between multiple options (arms) with unknown reward probabilities, aiming to maximize cumulative reward.