Category Archive

Reinforcement Learning 106
Deep Learning 30
Computer Science 11
Mathematics 7
Machine Learning 4

Reinforcement Learning

AlphaStar

Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II

OpenAI Five

Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2

Go-Explore

Discussion on Go-Explore, a family of algorithms designed for hard-exploration games

GAIL — Generative Adversarial Imitation Learning

A concise theoretical analysis of GAIL

FTW — For The Win

Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.

MuZero

Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games

AlphaZero

Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go

MAPPO

Discussion on Multi-Agent PPO, which includes a few tricks when applying PPO to multi-agent environments

QMIX and Some Tricks

Discussion on QMIX and some tricks on QMIX.

NCC — Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

Discussion on NCC, a cooperative MARL method that takes into account neighborhood cognitive consistency.

RODE — Learning Roles to Decompose Multi-Agent Tasks

Discussion on RODE, a hierarchical MARL method that decompose the action space into role action subspaces according to their effects on the environment.

PWIL — Primal Wasserstein Imitation Learning

Discussion on Primal Wasserstein Imitation Learning.

Network Regularization in Policy Optimization

Discussion on the effect of network regularization in policy optimization.

HIDIO — Hierarchical RL by Discovering Intrinsic Options

Discussion on HIDIO, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.

IDAAC — Invariant Decoupled Advantage Actor-Critic

Discussion on IDAAC, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.

DTSIL — Diverse Trajectory-conditioned Self-Imitation Learning

Discussion on Diverse Trajectory-conditioned Self-Imitation Learning,

MARL — A Survey and Critique

We present an overview of multi-agent reinforcement learning

PPG — Phasic Policy Gradient

Discussion on phasic policy gradient, which implements two disjoint networks for the policy and value function and optimizes them in two phases.

Deep Reinforcement Learning and its Neuroscientific Implications

Notes from Deep Reinforcement Learning and Its Neuroscientific Implications

Backward — Learning from a Single Demonstration

Discussion on a curriculum learning algorithm that gradually learns a policy gradient algorithm on Montezuma’s Revenge

The Mirage of Action-Dependent Baselines

Analysis on action-dependent baselines

Self-Tuning Reinforcement Learning

A self-tuning reinforcement learning algorithm for IMPALA.

V-trace

Theoretical analysis of the V-trace target.

Retrace(𝝀)

A theoretical analysis of the Retrace(𝝀) algorithm.

M-RL — Munchausen Reinforcement Learning

Discussion on Munchausen Reinforcement Learning, which considers policy in Bellman updates.

Behavior Priors for Kl regularized Reinforcement Learning

Discussion on behavior priors for KL regularized reinforcement learning

A Unified View of KL-Regularized RL

We present a unified view of policy gradient and soft Q-learning.

Hide and Seek

Discussion on an agent developed by OpenAI et al. that exhibits several emergent strategies in hide-and-seek environment.

P3O — Policy-on Policy-off Policy Optimization

Discussion on P3O, an policy gradient method that utilizes both on-policy and off-policy data.

What Matters In On-Policy Reinforcement Learning?

Discussion on several design decisions on on-policy reinforcement learning

MPO — Maximum a Posteriori Policy Optimization

Discussion on maximum a posteriori policy optimization, a KL-regularized reinforcement learning method.

Generalization in RL

Discussion on several recent works trying to improve generalization in deep reinforcement learning.

Network Randomization

Discussion on network randomization, a techinque improving generalization in reinforcement learning.

Efficient Value-Based RL

Discussion on several recent works trying to improve sample efficiency of reinforcement learning algorithms.

The Deadly Triad

We analyze how different components of DQN play a role in emergence of the deadly triad

TPPO — Truly PPO

We investigate the behavior of PPO and introduce new methods that forces the trust region constraint.

3rd-place solution to MineRL 2019 Competition

Discussion on the 3rd-place solution to MineRL 2019 Competition.

Combining EAs with RL

We summarize summarize several recent works that combine evolutionary algorithms with reinforcement learning.

CLEAR — Continual Learning with Experience And Replay

Discussion on continual learning with experience and replay, a simple method preventing catastrophic forgetting and improve stability of learning.

Agent57

Discussion on an agent, called Agent57, that outperforms the standard human benchmark on all Atari games.

Discussion on the Never-Give-Up(NGV) agent that achieves the state-of-the-art performance in hard exploration games in Atari without any prior knowledge while maintraining a very high score across the remaining games.

EC — Episodic Curiosity

Discussion on an exploration method based on episodic memory.

DreamerV2

Discussion on DreamerV2, a model-based algorithm reaching promising results on Atari games

Dreamer

Discussion on a model-based reinforcement learning agent called Dreamer

PlaNet: Deep Planning Network

Discussion on a model-based reinforcement learning agent called PlaNet

SIL - Self-Imitation Learning

Discussion on self-imitation learning, in which the agent exploits the previous transitions that receives better returnas than it expects

UNREAL — Unsupervised Reinforcement and Auxiliary Learning

Discussion on UNsupervised Reinforcement and Auxiliary Learning(UNREAL), which aims to fully utilize training signals from environments to speed up the learning process and gain better performance.

Time Limits in Reinforcement Learning

Discussion on the impact of time limits in reinforcement learning

Ape-X DQfD

Discussion on several enhancements on Ape-X DQN.

Solving Rubik’s Cube with a Robot Hand

Discussion on an agent, trained on simulation, can solve Rubik’s Cube with a real robot hand.

Challenges of Real-World Reinforcement Learning

Discussion on several challenges of real-world reinforcement learning.

REM - Random Ensemble Mixture

Discussion on a RL algorithm that exploit off-policy data.

BCQ — Batch-Constrained Deep Q-Learning

Discussion on a RL algorithm that exploit off-policy data.

Diagnosing Bottlenecks in DQN

Discussion on several concerns in deep (Q) learning.

SEED — Scalable Efficient Deep-RL

Discussion on a scalable reinforcement learning architecture that speeds up both data collection and learning process.

R2D2: Recurrent Replay Distributed DQN

Discussion on a distributed reinforcement learning architecture that incoporates a recurrent network into Ape-X.

IMPALA

Discussion on a distributed reinforcement learning architecture for policy gradient methods.

Ape-X

Discussion on a distributed reinforcement learning architecture for Q-learning methods.

HPG — Hindsight Policy Gradients

Discussion on a policy-gradient method with hindsight experience

PopArt: Preserving Outputs Precisely, while Adaptively Rescaling Targets

Discussion on a method that can learn values across many orders of magnitudes.

SchedNet — Schedule Network

Discussion on a multi-agent reinforcement learning algorithm that schedules communication between cooperative agents.

PR2 — Probabilistic Recursive Reasoning

Discussion on a multi-agent reinforcement learning algorithm that recursively reason the opponents’ behavior.

MADDPG — Multi-Agent-Deep deterministic Policy Gradient

Discussion on a multi-agent reinforcement learning algorithm that follows the framework of centralized training with decentralized execution.

EMI — Exploration with Mutual Information

Discussion on a novel exploration method based on representation learning

QWeb

Discussion on how to solve the web navigation problem using DQN.

MIRL — Mutual Information Reinforcement Learning

Discussion on a new regularization mechanism that leverage an optimal prior to explicitly penalize the mutual information between states and f.

MB-MRL — Model-Based Meta-Reinforcement Learning

Discussion on a model-based meta reinforcement learning algorithm that enables the agent to fast adapt to changes of environment.

PEARL — Probabilistic Embedding for Actor-critic RL

Discussion on an off-policy meta reinforcement learning algorithm that achieves state-of-the-art performance and sample efficiency.

MB-MPO — Model-Based Meta-Policy Optimization

Discussion on an algorithm that efficiently learns a robust policy by applying MAML to multiple dynamics model.

ProMP — Proximal MetaPolicy Search

We address the credit assignment problem of two forms of MAML with an RL objective and discuss an efficient and stable meta reinforcement learning algorithm.

Adaptive MAML — Applying MAML-RL to nonstationary environments

Discussion on a variant of MAML-RL for solving tasks that change dynamically due to non-stationary of the environment.

SNAIL — Simple Neural AttentIve meta-Learner

Discussion on a meta-learning architecture named Simple Neural AttentIve meta-Learner(SNAIL).

HAC — Learning Multi-Level Hierarchies with Hindsight

A norvel hierarchical reinforcement learning frame work that can efficiently learn multiple levels of policies in parallel.

NORL-HRL — Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning: An improvement to HIRO

HIRO — HIerarchical Reinforcement learning with Off-policy correction

Discussion on a hierarchical reinforcement learning algorithm for goal-directed tasks.

Hierarchical Guidance

Discussion on an algorithmic framework called hierarchical guidance, which leverages hierarchical structure in imitation learning.

SAC-X — Scheduled Auxiliary Control

Discussion on a new learning paradigm in RL that resorts to auxiliary policies to efficiently explore the environment.

TDM — Temporal Difference Models

Discussion on temporal difference models, an algorithm that tries to gain sample efficiency of model-based RL, while achieving asymptotic performance as model-free RL

FQF — Fully Parameterized Quantile Function

Discussion on fully parameterized quantile function, which improves IQN by further parameterizing the quantile proposal process.

QR-DQN, IQN

Discussion on two distributional deep Q networks, namely Quantile Regression Deep Q Network(QR-DQN) and Implicit Quantile Networks

ICM, RND

Discussion on two exploration methods based on curiosity, namely Intrinsic Curiosity Module (ICM) and Random Network Distillation(RND)

Some Exploration Algorithms: EX2, LSH, VIME etc.

Discussion on several exploration algorithms, including count-based methods, Thompson sampling, and information gain exploration.

DIAYN — Diversity Is All You Need

Discussion on an unsupervised learning method for learning useful skills without a reward function.

AIRL — Adversarial Inverse Reinforcement Learning

We introduce a practical GAN-style IRL algorithm named adversarial inverse reinforcement learning(AIRL)

GAN-GCL

We build a connection between maximum entropy inverse reinforcement learning and generative adversarial networks

GCL — Guided Cost Learning

We introduce a maximum entropy inverse reinforcement learning algorithm, named guided policy learning.

PCL — Path Consistency Learning and More

Discussion on path consistency learning and its derivatives.

SAC — Soft Actor-Critic with Adaptive Temperature

We introduce adaptive temperature to soft actor-critic(SAC).

SAC — Soft Actor-Critic

Discussion on soft actor-critic, a maximum entropy algorithm.

SVI — Soft Value Iteration

We address the optimism problem of the probabilistic graphical model introduced in the previous post via variational inference.

PGM — Probabilistic Graphic Model

Discussion on statistic inference in a temporal probabilistic graphical model.

GPS-iLQR — Guided Policy Search with iLQR

Discussion on iterative Linear Quadratic Regulator with a local linear-Gaussian model

LQR — Linear-Quadratic Regulator

Discussion on Linear Quadratic Regulator its derivatives

MB-MF — Model-Based Model-Free

Discussion on model-based model-free algorithm

GAE — Generalized Advantage Estimation

Discussion on a multi-step advantage estimation for online reinforcement learning

TRPO, PPO

Discussion on two policy-based algorithms which restrict the step size to help avoid big steps: Trust Region Policy Optimization(TRPO) and Proximal Policy Optimization(PPO).

Planning and Learning in Model-Based Reinforcement Learning Methods

Discussion on a series of algorithms in model-based reinforcement learning where planning and learning are intermixed.

Rainbow

Discussion on Rainbow, an integration of multiple improvements on DQN.

c51 — Distributional Deep Q Network

Discussion on the distributional deep Q network(a.k.a. c51), an improvement to deep Q network which replaces action-value Q with the value distribution to take on the stochastic nature of the environment.

PER — Prioritized Experience Replay

Discussion on prioritized experience replay, an improvement to the uniform experience replay used in deep Q network.

PG — Stochastic & Deterministic Policy Gradient

Discussion on policy gradient methods and its derivatives

IS — Importance Sampling

Discussion on importance sampling, the cornerstone of off-policy learning.

Basic Policies in Reinforcement Learning

We talk in detail about some wildly used policy in reinforcement learning, including epsilon-greedy policy, stochastic policy with temperature, upper confidence bound(UCB), and gradient bandit algorithm

DQN — Deep Q Network

Discussion on Deep Q network(DQN), a successful algorithm works in discrete-action environments

Deep Learning

MERLIN — Memory, RL, and Inference Network

Discussion on a memory architecture that allows us to do temporal relational reasoning.

Spectral Norm

Discussion on Spectral norm and its usage in deep learning

TransformerXL

Discussion on a successor of Transformer, namely TransformerXL, that can learn from sequences beyond a fixed length

Anti-Aliasing

Discussion on aliasing in modern convolutional neural networks and address it with low-pass filters.

SENet: Squeeze-and Excitation Network

Discussion on Squeeze-and Excitation Network, an architecture that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels.

EvoNorm

Discussion on EvoNorm, a set of uniform normalization-activation layers found by AutoML.

MobileNet

Discussion on MobileNet families

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Discussion on some useful results about how to learn a good representation

GCN, GLU — Gated Convolutional Network

Discussion on Gated Convolutional Network that applies 1D convolution to sequential data.

FiLM — Feature-wise Linear Modulation

Discussion on Feature-wise Linear Modulation

AdaNorm

We analyze layer normalization and discuss its improvement AdaNorm.

PtrNet: Pointer Network

Discussion on Pointer Network.

DNC — Improving Differentiable Neural Computer

Discussion on several improvements on differentiable neural computer.

DNC — Differentiable Neural Computer

Discussion on Differentiable Neural Computer.

NTM — Neural Turing Machines

Discussion on Neural Turing Machines, an architecture able to utilize an external memory.

SAGAN: Techniques in Self-Attention Generative Adversarial Networks

Discussion on several techniques involved in SAGAN, including self-attention, spectral normalization, conditional batch normalization, etc

MAML++: Improvements on MAML

Discussion on a series of improvements on MAML

MAML — Model-Agnostic Meta-Learning

Discussion on an optimization algorithm for meta-learning named Model-Agnostic Meta-Learning(MAML)

RMC — Relational Memory Core

Discussion on a recurrent architecture that allows us to do temporal relational reasoning.

Transformer

Discussion on a self-attention architecture named Transformer.

GQN — Generative Query Network

Discussion on the generative query network, a brand new unsupervised scene-based generative network.

Contrastive Predicting Coding

Discussion on a sequential representation learning model, contrastive predicting coding.

Beta-VAE and Its Variants

Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term

DIM — Deep INFOMAX

Discussion on Deep INFOMAX, a representation-learning method maximizing mutual information between the input and its representation based on MINE

MINE — Mutual Information Neural Estimation

Discussion on a neural estimator for mutual information, and some of its applications

R-CNN — Region-based Methods for Object Detection

Discussion on a series of region-based methods for object detection, and extend to Mask R-CNN for instance segmentation

GANs — Generative Adversarial Networks

Discussion on the generative adversarial network in two ways: one for data generation, and the other for semi-supervised learning. In the end, we’ll also demonstrate some techniques that help improve GANs

VAE — Variational Autoencoder

Discussion on variational autoencoders, a kind of generative networks which allows us to alter data in a desired, specific way

YOLO — You Only Look Once

Discussion on YOLO, a state-of-the-art real-time object detection algorithm

Optimization

Discussion on first-order optimization algorithms in machine learning, which optimize the objective function based on gradients.