SAC-X — Scheduled Auxiliary Control
Discussion on a new learning paradigm in RL that resorts to auxiliary policies to efficiently explore the environment.
TDM — Temporal Difference Models
Discussion on temporal difference models, an algorithm that tries to gain sample efficiency of model-based RL, while achieving asymptotic performance as model-free RL
RMC — Relational Memory Core
Discussion on a recurrent architecture that allows us to do temporal relational reasoning.
Exponential Families
Discussion on Exponential Famlies
FQF — Fully Parameterized Quantile Function
Discussion on fully parameterized quantile function, which improves IQN by further parameterizing the quantile proposal process.
QR-DQN, IQN
Discussion on two distributional deep Q networks, namely Quantile Regression Deep Q Network(QR-DQN) and Implicit Quantile Networks
ICM, RND
Discussion on two exploration methods based on curiosity, namely Intrinsic Curiosity Module (ICM) and Random Network Distillation(RND)
Some Exploration Algorithms: EX2, LSH, VIME etc.
Discussion on several exploration algorithms, including count-based methods, Thompson sampling, and information gain exploration.
DIAYN — Diversity Is All You Need
Discussion on an unsupervised learning method for learning useful skills without a reward function.
Transformer
Discussion on a self-attention architecture named Transformer.