Value-Based RL

Retrace(𝝀)

A theoretical analysis of the Retrace(𝝀) algorithm.

6 min read

Efficient Value-Based RL

Discussion on several recent works trying to improve sample efficiency of reinforcement learning algorithms.

3 min read

The Deadly Triad

We analyze how different components of DQN play a role in emergence of the deadly triad

2 min read

Agent57

Discussion on an agent, called Agent57, that outperforms the standard human benchmark on all Atari games.

4 min read

NGU β€” Never Give Up

Discussion on the Never-Give-Up(NGV) agent that achieves the state-of-the-art performance in hard exploration games in Atari without any prior knowledge while maintraining a very high score across the remaining games.

9 min read

QR-DQN, IQN

Discussion on two distributional deep Q networks, namely Quantile Regression Deep Q Network(QR-DQN) and Implicit Quantile Networks

8 min read

Rainbow

Discussion on Rainbow, an integration of multiple improvements on DQN.

8 min read

c51 β€” Distributional Deep Q Network

Discussion on the distributional deep Q network(a.k.a. c51), an improvement to deep Q network which replaces action-value Q with the value distribution to take on the stochastic nature of the environment.

6 min read

DQN β€” Deep Q Network

Discussion on Deep Q network(DQN), a successful algorithm works in discrete-action environments

4 min read
Back to Top ↑

Mathematics

V-trace

Theoretical analysis of the V-trace target.

6 min read

Retrace(𝝀)

A theoretical analysis of the Retrace(𝝀) algorithm.

6 min read

Spectral Norm

Discussion on Spectral norm and its usage in deep learning

6 min read

Math

We summarize some mathematical concepts used in deep reinforcement learning

5 min read

SVI β€” Soft Value Iteration

We address the optimism problem of the probabilistic graphical model introduced in the previous post via variational inference.

5 min read

SL β€” Statistic Learning: A Connection to Neural Networks

We expand the topic of latent variable models in a sense that the latent variables model the underlying structure of the observed data, whereby the model is able to do statistical inference over these latent variables. Then we will build a connnection between statistic learning and neural networks.

3 min read

SCG β€” Stochastic Computational Graphs

Discussion on stochastic computational graphs, a type of directed asyclic computational graph that include both deterministic functions and conditional probability distrbutions.

5 min read

TRPO, PPO

Discussion on two policy-based algorithms which restrict the step size to help avoid big steps: Trust Region Policy Optimization(TRPO) and Proximal Policy Optimization(PPO).

9 min read

PCA and Whitening

Discussion on dimensionality reduction technique PCA, and its derivatives whitening and ZCA whitening

6 min read
Back to Top ↑

Exploration in RL

Go-Explore

Discussion on Go-Explore, a family of algorithms designed for hard-exploration games

9 min read

Agent57

Discussion on an agent, called Agent57, that outperforms the standard human benchmark on all Atari games.

4 min read

NGU β€” Never Give Up

Discussion on the Never-Give-Up(NGV) agent that achieves the state-of-the-art performance in hard exploration games in Atari without any prior knowledge while maintraining a very high score across the remaining games.

9 min read

Ape-X DQfD

Discussion on several enhancements on Ape-X DQN.

6 min read

ICM, RND

Discussion on two exploration methods based on curiosity, namely Intrinsic Curiosity Module (ICM) and Random Network Distillation(RND)

6 min read

Basic Policies in Reinforcement Learning

We talk in detail about some wildly used policy in reinforcement learning, including epsilon-greedy policy, stochastic policy with temperature, upper confidence bound(UCB), and gradient bandit algorithm

5 min read
Back to Top ↑

Multi-Agent RL

AlphaStar

Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II

17 min read

OpenAI Five

Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2

13 min read

FTW β€” For The Win

Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.

10 min read

MuZero

Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games

13 min read

AlphaZero

Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go

3 min read

MAPPO

Discussion on Multi-Agent PPO, which includes a few tricks when applying PPO to multi-agent environments

2 min read

Hide and Seek

Discussion on an agent developed by OpenAI et al. that exhibits several emergent strategies in hide-and-seek environment.

3 min read

SchedNet β€” Schedule Network

Discussion on a multi-agent reinforcement learning algorithm that schedules communication between cooperative agents.

6 min read
Back to Top ↑

Policy-Gradient RL

PPG β€” Phasic Policy Gradient

Discussion on phasic policy gradient, which implements two disjoint networks for the policy and value function and optimizes them in two phases.

5 min read

TPPO β€” Truly PPO

We investigate the behavior of PPO and introduce new methods that forces the trust region constraint.

3 min read

TRPO, PPO

Discussion on two policy-based algorithms which restrict the step size to help avoid big steps: Trust Region Policy Optimization(TRPO) and Proximal Policy Optimization(PPO).

9 min read
Back to Top ↑

Model-Based RL

MuZero

Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games

13 min read

AlphaZero

Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go

3 min read

DreamerV2

Discussion on DreamerV2, a model-based algorithm reaching promising results on Atari games

6 min read

Dreamer

Discussion on a model-based reinforcement learning agent called Dreamer

6 min read

TDM β€” Temporal Difference Models

Discussion on temporal difference models, an algorithm that tries to gain sample efficiency of model-based RL, while achieving asymptotic performance as model-free RL

5 min read
Back to Top ↑

RL Application

AlphaStar

Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II

17 min read

OpenAI Five

Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2

13 min read

FTW β€” For The Win

Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.

10 min read

MuZero

Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games

13 min read

AlphaZero

Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go

3 min read

Hide and Seek

Discussion on an agent developed by OpenAI et al. that exhibits several emergent strategies in hide-and-seek environment.

3 min read

QWeb

Discussion on how to solve the web navigation problem using DQN.

10 min read
Back to Top ↑

Regularized RL

Back to Top ↑

Sequential Model

TransformerXL

Discussion on a successor of Transformer, namely TransformerXL, that can learn from sequences beyond a fixed length

8 min read

Transformer

Discussion on a self-attention architecture named Transformer.

7 min read
Back to Top ↑

Distributed RL

AlphaStar

Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II

17 min read

OpenAI Five

Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2

13 min read

FTW β€” For The Win

Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.

10 min read

Ape-X DQfD

Discussion on several enhancements on Ape-X DQN.

6 min read

IMPALA

Discussion on a distributed reinforcement learning architecture for policy gradient methods.

8 min read

Ape-X

Discussion on a distributed reinforcement learning architecture for Q-learning methods.

1 min read
Back to Top ↑

Hierarchical RL

FTW β€” For The Win

Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.

10 min read

Hierarchical Guidance

Discussion on an algorithmic framework called hierarchical guidance, which leverages hierarchical structure in imitation learning.

3 min read
Back to Top ↑

Meta-Learning

ProMP β€” Proximal MetaPolicy Search

We address the credit assignment problem of two forms of MAML with an RL objective and discuss an efficient and stable meta reinforcement learning algorithm.

5 min read
Back to Top ↑

Representation Learning

Contrastive Predicting Coding

Discussion on a sequential representation learning model, contrastive predicting coding.

5 min read

Beta-VAE and Its Variants

Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term

9 min read

DIM β€” Deep INFOMAX

Discussion on Deep INFOMAX, a representation-learning method maximizing mutual information between the input and its representation based on MINE

10 min read

VAE β€” Variational Autoencoder

Discussion on variational autoencoders, a kind of generative networks which allows us to alter data in a desired, specific way

4 min read
Back to Top ↑

Computer Vision

Anti-Aliasing

Discussion on aliasing in modern convolutional neural networks and address it with low-pass filters.

2 min read

SENet: Squeeze-and Excitation Network

Discussion on Squeeze-and Excitation Network, an architecture that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels.

2 min read

MobileNet

Discussion on MobileNet families

5 min read

GANs β€” Generative Adversarial Networks

Discussion on the generative adversarial network in two ways: one for data generation, and the other for semi-supervised learning. In the end, we’ll also demonstrate some techniques that help improve GANs

8 min read
Back to Top ↑

Generative Network

Contrastive Predicting Coding

Discussion on a sequential representation learning model, contrastive predicting coding.

5 min read

Beta-VAE and Its Variants

Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term

9 min read

GANs β€” Generative Adversarial Networks

Discussion on the generative adversarial network in two ways: one for data generation, and the other for semi-supervised learning. In the end, we’ll also demonstrate some techniques that help improve GANs

8 min read

VAE β€” Variational Autoencoder

Discussion on variational autoencoders, a kind of generative networks which allows us to alter data in a desired, specific way

4 min read
Back to Top ↑

Inverse RL

GAN-GCL

We build a connection between maximum entropy inverse reinforcement learning and generative adversarial networks

5 min read

GCL β€” Guided Cost Learning

We introduce a maximum entropy inverse reinforcement learning algorithm, named guided policy learning.

10 min read
Back to Top ↑

Machine Learning

t-SNE

Discussion on t-SNE, an unsupervised learning algorithm commonly used in data visualization.

4 min read

PCA and Whitening

Discussion on dimensionality reduction technique PCA, and its derivatives whitening and ZCA whitening

6 min read
Back to Top ↑

Multitask RL

Back to Top ↑

Tricks

The Deadly Triad

We analyze how different components of DQN play a role in emergence of the deadly triad

2 min read
Back to Top ↑

Generalization in RL

Generalization in RL

Discussion on several recent works trying to improve generalization in deep reinforcement learning.

4 min read

Network Randomization

Discussion on network randomization, a techinque improving generalization in reinforcement learning.

2 min read
Back to Top ↑

Imitation Learning

Ape-X DQfD

Discussion on several enhancements on Ape-X DQN.

6 min read

Hierarchical Guidance

Discussion on an algorithmic framework called hierarchical guidance, which leverages hierarchical structure in imitation learning.

3 min read
Back to Top ↑

Memory in RL

Back to Top ↑

Network Layer

Spectral Norm

Discussion on Spectral norm and its usage in deep learning

6 min read

EvoNorm

Discussion on EvoNorm, a set of uniform normalization-activation layers found by AutoML.

4 min read

AdaNorm

We analyze layer normalization and discuss its improvement AdaNorm.

1 min read
Back to Top ↑

Offline RL

Back to Top ↑

Overviews

Back to Top ↑

Curriculum Learning

Back to Top ↑

Evolutionary Algorithms

Combining EAs with RL

We summarize summarize several recent works that combine evolutionary algorithms with reinforcement learning.

1 min read
Back to Top ↑

Lifelong Learning

Back to Top ↑

Meta-Gradient RL

Back to Top ↑

Methodology

Back to Top ↑

Optimizer

Optimization

Discussion on first-order optimization algorithms in machine learning, which optimize the objective function based on gradients.

5 min read
Back to Top ↑

Policy-Based RL

V-trace

Theoretical analysis of the V-trace target.

6 min read
Back to Top ↑

Self-Imitation Learning

SIL - Self-Imitation Learning

Discussion on self-imitation learning, in which the agent exploits the previous transitions that receives better returnas than it expects

1 min read
Back to Top ↑

Transfer Learning in RL

Back to Top ↑

Visualization

t-SNE

Discussion on t-SNE, an unsupervised learning algorithm commonly used in data visualization.

4 min read
Back to Top ↑