TRPO

Software / App

Trust Region Policy Optimization, an off-policy RL algorithm that takes multiple steps while staying close to the original policy using importance weighting corrections.

Mentioned in 1 video