P3O ā Policy-on Policy-off Policy Optimization
Discussion on P3O, an policy gradient method that utilizes both on-policy and off-policy data.
Reactor ā Retrace Actor
Discussion on š½-LOO.
What Matters In On-Policy Reinforcement Learning?
Discussion on several design decisions on on-policy reinforcement learning
MPO ā Maximum a Posteriori Policy Optimization
Discussion on maximum a posteriori policy optimization, a KL-regularized reinforcement learning method.
MERLIN ā Memory, RL, and Inference Network
Discussion on a memory architecture that allows us to do temporal relational reasoning.
Spectral Norm
Discussion on Spectral norm and its usage in deep learning
TransformerXL
Discussion on a successor of Transformer, namely TransformerXL, that can learn from sequences beyond a fixed length
Generalization in RL
Discussion on several recent works trying to improve generalization in deep reinforcement learning.
Network Randomization
Discussion on network randomization, a techinque improving generalization in reinforcement learning.
Efficient Value-Based RL
Discussion on several recent works trying to improve sample efficiency of reinforcement learning algorithms.