Rainbow
Discussion on Rainbow, an integration of multiple improvements on DQN.
c51 — Distributional Deep Q Network
Discussion on the distributional deep Q network(a.k.a. c51), an improvement to deep Q network which replaces action-value Q with the value distribution to take on the stochastic nature of the environment.
PER — Prioritized Experience Replay
Discussion on prioritized experience replay, an improvement to the uniform experience replay used in deep Q network.
PG — Stochastic & Deterministic Policy Gradient
Discussion on policy gradient methods and its derivatives
IS — Importance Sampling
Discussion on importance sampling, the cornerstone of off-policy learning.
Basic Policies in Reinforcement Learning
We talk in detail about some wildly used policy in reinforcement learning, including epsilon-greedy policy, stochastic policy with temperature, upper confidence bound(UCB), and gradient bandit algorithm
DQN — Deep Q Network
Discussion on Deep Q network(DQN), a successful algorithm works in discrete-action environments
Contrastive Predicting Coding
Discussion on a sequential representation learning model, contrastive predicting coding.
Beta-VAE and Its Variants
Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term
DIM — Deep INFOMAX
Discussion on Deep INFOMAX, a representation-learning method maximizing mutual information between the input and its representation based on MINE