TRPO
Software / App
Trust Region Policy Optimization, an off-policy RL algorithm that takes multiple steps while staying close to the original policy using importance weighting corrections.
Mentioned in 1 video
Trust Region Policy Optimization, an off-policy RL algorithm that takes multiple steps while staying close to the original policy using importance weighting corrections.