An asynchronous implementation of policy gradient methods that achieves very good results.
Lex Fridman