Post Archive | Zero

2021 29
2020 48
2019 53
2018 29

2021

AlphaStar

Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II

OpenAI Five

Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2

Go-Explore

Discussion on Go-Explore, a family of algorithms designed for hard-exploration games

GAIL — Generative Adversarial Imitation Learning

A concise theoretical analysis of GAIL

FTW — For The Win

Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.

MuZero

Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games

AlphaZero

Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go

MAPPO

Discussion on Multi-Agent PPO, which includes a few tricks when applying PPO to multi-agent environments

QMIX and Some Tricks

Discussion on QMIX and some tricks on QMIX.

NCC — Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Discussion on NCC, a cooperative MARL method that takes into account neighborhood cognitive consistency.

RODE — Learning Roles to Decompose Multi-Agent Tasks

Discussion on RODE, a hierarchical MARL method that decompose the action space into role action subspaces according to their effects on the environment.

PWIL — Primal Wasserstein Imitation Learning

Discussion on Primal Wasserstein Imitation Learning.

Network Regularization in Policy Optimization

Discussion on the effect of network regularization in policy optimization.

HIDIO — Hierarchical RL by Discovering Intrinsic Options

Discussion on HIDIO, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.

IDAAC — Invariant Decoupled Advantage Actor-Critic

Discussion on IDAAC, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.

DTSIL — Diverse Trajectory-conditioned Self-Imitation Learning

Discussion on Diverse Trajectory-conditioned Self-Imitation Learning,

TAC — Tsallis Actor Critic

Discussion on Tsallis Actor Critic

MARL — A Survey and Critique

We present an overview of multi-agent reinforcement learning

C++ Concurrency in Action — Chapter 9

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 8

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 7

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 6

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 5

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 4

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 3

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 2

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 10

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 10

Notes from Williams’ C++ Concurrency in Action

C++ Concurrency in Action — Chapter 1

Notes from Williams’ C++ Concurrency in Action

Back to Top ↑

2020

PPG — Phasic Policy Gradient

Discussion on phasic policy gradient, which implements two disjoint networks for the policy and value function and optimizes them in two phases.

Deep Reinforcement Learning and its Neuroscientific Implications

Notes from Deep Reinforcement Learning and Its Neuroscientific Implications

Backward — Learning from a Single Demonstration

Discussion on a curriculum learning algorithm that gradually learns a policy gradient algorithm on Montezuma’s Revenge

The Mirage of Action-Dependent Baselines

Analysis on action-dependent baselines

Self-Tuning Reinforcement Learning

A self-tuning reinforcement learning algorithm for IMPALA.

V-trace

Theoretical analysis of the V-trace target.

Retrace(𝝀)

A theoretical analysis of the Retrace(𝝀) algorithm.

M-RL — Munchausen Reinforcement Learning

Discussion on Munchausen Reinforcement Learning, which considers policy in Bellman updates.

Behavior Priors for Kl regularized Reinforcement Learning

Discussion on behavior priors for KL regularized reinforcement learning

A Unified View of KL-Regularized RL

We present a unified view of policy gradient and soft Q-learning.

Hide and Seek

Discussion on an agent developed by OpenAI et al. that exhibits several emergent strategies in hide-and-seek environment.

P3O — Policy-on Policy-off Policy Optimization

Discussion on P3O, an policy gradient method that utilizes both on-policy and off-policy data.

Reactor — Retrace Actor

Discussion on 𝛽-LOO.

What Matters In On-Policy Reinforcement Learning?

Discussion on several design decisions on on-policy reinforcement learning

MPO — Maximum a Posteriori Policy Optimization

Discussion on maximum a posteriori policy optimization, a KL-regularized reinforcement learning method.

MERLIN — Memory, RL, and Inference Network

Discussion on a memory architecture that allows us to do temporal relational reasoning.

Spectral Norm

Discussion on Spectral norm and its usage in deep learning

TransformerXL

Discussion on a successor of Transformer, namely TransformerXL, that can learn from sequences beyond a fixed length

Generalization in RL

Discussion on several recent works trying to improve generalization in deep reinforcement learning.

Network Randomization

Discussion on network randomization, a techinque improving generalization in reinforcement learning.

Efficient Value-Based RL

Discussion on several recent works trying to improve sample efficiency of reinforcement learning algorithms.

The Deadly Triad

We analyze how different components of DQN play a role in emergence of the deadly triad

TPPO — Truly PPO

We investigate the behavior of PPO and introduce new methods that forces the trust region constraint.

3rd-place solution to MineRL 2019 Competition

Discussion on the 3rd-place solution to MineRL 2019 Competition.

Anti-Aliasing

Discussion on aliasing in modern convolutional neural networks and address it with low-pass filters.

SENet: Squeeze-and Excitation Network

Discussion on Squeeze-and Excitation Network, an architecture that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels.

EvoNorm

Discussion on EvoNorm, a set of uniform normalization-activation layers found by AutoML.

MobileNet

Discussion on MobileNet families

Math

We summarize some mathematical concepts used in deep reinforcement learning

Combining EAs with RL

We summarize summarize several recent works that combine evolutionary algorithms with reinforcement learning.

CLEAR — Continual Learning with Experience And Replay

Discussion on continual learning with experience and replay, a simple method preventing catastrophic forgetting and improve stability of learning.

Agent57

Discussion on an agent, called Agent57, that outperforms the standard human benchmark on all Atari games.

NGU — Never Give Up

Discussion on the Never-Give-Up(NGV) agent that achieves the state-of-the-art performance in hard exploration games in Atari without any prior knowledge while maintraining a very high score across the remaining games.

From 1st Wasserstein to Kantorovich-Rubinstein Duality

An introduction to the dual of the 1st Wasserstein distance.

Duality in Linear Programm

An introduction to dual linear programs

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Discussion on some useful results about how to learn a good representation

GCN, GLU — Gated Convolutional Network

Discussion on Gated Convolutional Network that applies 1D convolution to sequential data.

EC — Episodic Curiosity

Discussion on an exploration method based on episodic memory.

FiLM — Feature-wise Linear Modulation

Discussion on Feature-wise Linear Modulation

DreamerV2

Discussion on DreamerV2, a model-based algorithm reaching promising results on Atari games

Dreamer

Discussion on a model-based reinforcement learning agent called Dreamer

PlaNet: Deep Planning Network

Discussion on a model-based reinforcement learning agent called PlaNet

SIL - Self-Imitation Learning

Discussion on self-imitation learning, in which the agent exploits the previous transitions that receives better returnas than it expects

AdaNorm

We analyze layer normalization and discuss its improvement AdaNorm.

UNREAL — Unsupervised Reinforcement and Auxiliary Learning

Discussion on UNsupervised Reinforcement and Auxiliary Learning(UNREAL), which aims to fully utilize training signals from environments to speed up the learning process and gain better performance.

Time Limits in Reinforcement Learning

Discussion on the impact of time limits in reinforcement learning

PtrNet: Pointer Network

Discussion on Pointer Network.

Ape-X DQfD

Discussion on several enhancements on Ape-X DQN.

Back to Top ↑

2019

Solving Rubik’s Cube with a Robot Hand

Discussion on an agent, trained on simulation, can solve Rubik’s Cube with a real robot hand.

Challenges of Real-World Reinforcement Learning

Discussion on several challenges of real-world reinforcement learning.

REM - Random Ensemble Mixture

Discussion on a RL algorithm that exploit off-policy data.

BCQ — Batch-Constrained Deep Q-Learning

Discussion on a RL algorithm that exploit off-policy data.

Diagnosing Bottlenecks in DQN

Discussion on several concerns in deep (Q) learning.

SEED — Scalable Efficient Deep-RL

Discussion on a scalable reinforcement learning architecture that speeds up both data collection and learning process.

R2D2: Recurrent Replay Distributed DQN

Discussion on a distributed reinforcement learning architecture that incoporates a recurrent network into Ape-X.

IMPALA

Discussion on a distributed reinforcement learning architecture for policy gradient methods.

Ape-X

Discussion on a distributed reinforcement learning architecture for Q-learning methods.

DNC — Improving Differentiable Neural Computer

Discussion on several improvements on differentiable neural computer.

DNC — Differentiable Neural Computer

Discussion on Differentiable Neural Computer.

NTM — Neural Turing Machines

Discussion on Neural Turing Machines, an architecture able to utilize an external memory.

HPG — Hindsight Policy Gradients

Discussion on a policy-gradient method with hindsight experience

PopArt: Preserving Outputs Precisely, while Adaptively Rescaling Targets

Discussion on a method that can learn values across many orders of magnitudes.

SchedNet — Schedule Network

Discussion on a multi-agent reinforcement learning algorithm that schedules communication between cooperative agents.

PR2 — Probabilistic Recursive Reasoning

Discussion on a multi-agent reinforcement learning algorithm that recursively reason the opponents’ behavior.

MADDPG — Multi-Agent-Deep deterministic Policy Gradient

Discussion on a multi-agent reinforcement learning algorithm that follows the framework of centralized training with decentralized execution.

EMI — Exploration with Mutual Information

Discussion on a novel exploration method based on representation learning

QWeb

Discussion on how to solve the web navigation problem using DQN.

MIRL — Mutual Information Reinforcement Learning

Discussion on a new regularization mechanism that leverage an optimal prior to explicitly penalize the mutual information between states and f.

SAGAN: Techniques in Self-Attention Generative Adversarial Networks

Discussion on several techniques involved in SAGAN, including self-attention, spectral normalization, conditional batch normalization, etc

MB-MRL — Model-Based Meta-Reinforcement Learning

Discussion on a model-based meta reinforcement learning algorithm that enables the agent to fast adapt to changes of environment.

PEARL — Probabilistic Embedding for Actor-critic RL

Discussion on an off-policy meta reinforcement learning algorithm that achieves state-of-the-art performance and sample efficiency.

MB-MPO — Model-Based Meta-Policy Optimization

Discussion on an algorithm that efficiently learns a robust policy by applying MAML to multiple dynamics model.

ProMP — Proximal MetaPolicy Search

We address the credit assignment problem of two forms of MAML with an RL objective and discuss an efficient and stable meta reinforcement learning algorithm.

Adaptive MAML — Applying MAML-RL to nonstationary environments

Discussion on a variant of MAML-RL for solving tasks that change dynamically due to non-stationary of the environment.

MAML++: Improvements on MAML

Discussion on a series of improvements on MAML

MAML — Model-Agnostic Meta-Learning

Discussion on an optimization algorithm for meta-learning named Model-Agnostic Meta-Learning(MAML)

SNAIL — Simple Neural AttentIve meta-Learner

Discussion on a meta-learning architecture named Simple Neural AttentIve meta-Learner(SNAIL).

HAC — Learning Multi-Level Hierarchies with Hindsight

A norvel hierarchical reinforcement learning frame work that can efficiently learn multiple levels of policies in parallel.

NORL-HRL — Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning: An improvement to HIRO

HIRO — HIerarchical Reinforcement learning with Off-policy correction

Discussion on a hierarchical reinforcement learning algorithm for goal-directed tasks.

Hierarchical Guidance

Discussion on an algorithmic framework called hierarchical guidance, which leverages hierarchical structure in imitation learning.

SAC-X — Scheduled Auxiliary Control

Discussion on a new learning paradigm in RL that resorts to auxiliary policies to efficiently explore the environment.

TDM — Temporal Difference Models

Discussion on temporal difference models, an algorithm that tries to gain sample efficiency of model-based RL, while achieving asymptotic performance as model-free RL

RMC — Relational Memory Core

Discussion on a recurrent architecture that allows us to do temporal relational reasoning.

Exponential Families

Discussion on Exponential Famlies

FQF — Fully Parameterized Quantile Function

Discussion on fully parameterized quantile function, which improves IQN by further parameterizing the quantile proposal process.

QR-DQN, IQN

Discussion on two distributional deep Q networks, namely Quantile Regression Deep Q Network(QR-DQN) and Implicit Quantile Networks

ICM, RND

Discussion on two exploration methods based on curiosity, namely Intrinsic Curiosity Module (ICM) and Random Network Distillation(RND)

Some Exploration Algorithms: EX2, LSH, VIME etc.

Discussion on several exploration algorithms, including count-based methods, Thompson sampling, and information gain exploration.

DIAYN — Diversity Is All You Need

Discussion on an unsupervised learning method for learning useful skills without a reward function.

Transformer

Discussion on a self-attention architecture named Transformer.

AIRL — Adversarial Inverse Reinforcement Learning

We introduce a practical GAN-style IRL algorithm named adversarial inverse reinforcement learning(AIRL)

GAN-GCL

We build a connection between maximum entropy inverse reinforcement learning and generative adversarial networks

GCL — Guided Cost Learning

We introduce a maximum entropy inverse reinforcement learning algorithm, named guided policy learning.

PCL — Path Consistency Learning and More

Discussion on path consistency learning and its derivatives.

SAC — Soft Actor-Critic with Adaptive Temperature

We introduce adaptive temperature to soft actor-critic(SAC).

SAC — Soft Actor-Critic

Discussion on soft actor-critic, a maximum entropy algorithm.

SVI — Soft Value Iteration

We address the optimism problem of the probabilistic graphical model introduced in the previous post via variational inference.

PGM — Probabilistic Graphic Model

Discussion on statistic inference in a temporal probabilistic graphical model.

SL — Statistic Learning: A Connection to Neural Networks

We expand the topic of latent variable models in a sense that the latent variables model the underlying structure of the observed data, whereby the model is able to do statistical inference over these latent variables. Then we will build a connnection between statistic learning and neural networks.

Probabilistic Latent Variable Models

Introduction

Back to Top ↑

2018

EM — Expectation-Maximization Algorithm

Discussion on the Expectation-Maximization(EM) algorithm, and its application to GMMs

GPS-iLQR — Guided Policy Search with iLQR

Discussion on iterative Linear Quadratic Regulator with a local linear-Gaussian model

LQR — Linear-Quadratic Regulator

Discussion on Linear Quadratic Regulator its derivatives

MB-MF — Model-Based Model-Free

Discussion on model-based model-free algorithm

SCG — Stochastic Computational Graphs

Discussion on stochastic computational graphs, a type of directed asyclic computational graph that include both deterministic functions and conditional probability distrbutions.

GAE — Generalized Advantage Estimation

Discussion on a multi-step advantage estimation for online reinforcement learning

TRPO, PPO

Discussion on two policy-based algorithms which restrict the step size to help avoid big steps: Trust Region Policy Optimization(TRPO) and Proximal Policy Optimization(PPO).

CG — Conjugate Gradient Method

Discussion on the conjugate gradient method in chaos :-)

Planning and Learning in Model-Based Reinforcement Learning Methods

Discussion on a series of algorithms in model-based reinforcement learning where planning and learning are intermixed.

GQN — Generative Query Network

Discussion on the generative query network, a brand new unsupervised scene-based generative network.

Rainbow

Discussion on Rainbow, an integration of multiple improvements on DQN.

c51 — Distributional Deep Q Network

Discussion on the distributional deep Q network(a.k.a. c51), an improvement to deep Q network which replaces action-value Q with the value distribution to take on the stochastic nature of the environment.

PER — Prioritized Experience Replay

Discussion on prioritized experience replay, an improvement to the uniform experience replay used in deep Q network.

PG — Stochastic & Deterministic Policy Gradient

Discussion on policy gradient methods and its derivatives

IS — Importance Sampling

Discussion on importance sampling, the cornerstone of off-policy learning.

Basic Policies in Reinforcement Learning

We talk in detail about some wildly used policy in reinforcement learning, including epsilon-greedy policy, stochastic policy with temperature, upper confidence bound(UCB), and gradient bandit algorithm

DQN — Deep Q Network

Discussion on Deep Q network(DQN), a successful algorithm works in discrete-action environments

Contrastive Predicting Coding

Discussion on a sequential representation learning model, contrastive predicting coding.

Beta-VAE and Its Variants

Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term

DIM — Deep INFOMAX

Discussion on Deep INFOMAX, a representation-learning method maximizing mutual information between the input and its representation based on MINE

MINE — Mutual Information Neural Estimation

Discussion on a neural estimator for mutual information, and some of its applications

R-CNN — Region-based Methods for Object Detection

Discussion on a series of region-based methods for object detection, and extend to Mask R-CNN for instance segmentation

GANs — Generative Adversarial Networks

Discussion on the generative adversarial network in two ways: one for data generation, and the other for semi-supervised learning. In the end, we’ll also demonstrate some techniques that help improve GANs

VAE — Variational Autoencoder

Discussion on variational autoencoders, a kind of generative networks which allows us to alter data in a desired, specific way

t-SNE

Discussion on t-SNE, an unsupervised learning algorithm commonly used in data visualization.

YOLO — You Only Look Once

Discussion on YOLO, a state-of-the-art real-time object detection algorithm

PCA and Whitening

Discussion on dimensionality reduction technique PCA, and its derivatives whitening and ZCA whitening

Optimization

Discussion on first-order optimization algorithms in machine learning, which optimize the objective function based on gradients.

SVM — Support Vector Machines

An introduction to support vector machines

Back to Top ↑