Page 15 of 16 for Zero | This blog no longer updates but I’m still in my quest of RL. For anyone interested in discussion of recent advance of AI/RL, please contact me via my emails: 122134545@qq.com/o.xlnwel@gmail.com

Rainbow

Discussion on Rainbow, an integration of multiple improvements on DQN.

c51 — Distributional Deep Q Network

Discussion on the distributional deep Q network(a.k.a. c51), an improvement to deep Q network which replaces action-value Q with the value distribution to take on the stochastic nature of the environment.

PER — Prioritized Experience Replay

Discussion on prioritized experience replay, an improvement to the uniform experience replay used in deep Q network.

PG — Stochastic & Deterministic Policy Gradient

Discussion on policy gradient methods and its derivatives

IS — Importance Sampling

Discussion on importance sampling, the cornerstone of off-policy learning.

Basic Policies in Reinforcement Learning

We talk in detail about some wildly used policy in reinforcement learning, including epsilon-greedy policy, stochastic policy with temperature, upper confidence bound(UCB), and gradient bandit algorithm

DQN — Deep Q Network

Discussion on Deep Q network(DQN), a successful algorithm works in discrete-action environments

Contrastive Predicting Coding

Discussion on a sequential representation learning model, contrastive predicting coding.

Beta-VAE and Its Variants

Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term

DIM — Deep INFOMAX

Discussion on Deep INFOMAX, a representation-learning method maximizing mutual information between the input and its representation based on MINE