- Value-Based RL 20
- Mathematics 17
- Exploration in RL 14
- Multi-Agent RL 14
- Policy-Gradient RL 13
- Model-Based RL 12
- RL Application 9
- Regularized RL 9
- Sequential Model 9
- Distributed RL 8
- Hierarchical RL 8
- Meta-Learning 8
- Representation Learning 8
- Computer Vision 7
- Generative Network 7
- Inverse RL 4
- Machine Learning 4
- Multitask RL 4
- Tricks 4
- Generalization in RL 3
- Imitation Learning 3
- Memory in RL 3
- Network Layer 3
- Offline RL 3
- Overviews 3
- Curriculum Learning 1
- Evolutionary Algorithms 1
- Lifelong Learning 1
- Meta-Gradient RL 1
- Methodology 1
- Optimizer 1
- Policy-Based RL 1
- Self-Imitation Learning 1
- Transfer Learning in RL 1
- Visualization 1
Value-Based RL
Retrace(π)
A theoretical analysis of the Retrace(π) algorithm.
M-RL β Munchausen Reinforcement Learning
Discussion on Munchausen Reinforcement Learning, which considers policy in Bellman updates.
A Unified View of KL-Regularized RL
We present a unified view of policy gradient and soft Q-learning.
Reactor β Retrace Actor
Discussion on π½-LOO.
Efficient Value-Based RL
Discussion on several recent works trying to improve sample efficiency of reinforcement learning algorithms.
The Deadly Triad
We analyze how different components of DQN play a role in emergence of the deadly triad
Agent57
Discussion on an agent, called Agent57, that outperforms the standard human benchmark on all Atari games.
NGU β Never Give Up
Discussion on the Never-Give-Up(NGV) agent that achieves the state-of-the-art performance in hard exploration games in Atari without any prior knowledge while maintraining a very high score across the remaining games.
Diagnosing Bottlenecks in DQN
Discussion on several concerns in deep (Q) learning.
PopArt: Preserving Outputs Precisely, while Adaptively Rescaling Targets
Discussion on a method that can learn values across many orders of magnitudes.
FQF β Fully Parameterized Quantile Function
Discussion on fully parameterized quantile function, which improves IQN by further parameterizing the quantile proposal process.
QR-DQN, IQN
Discussion on two distributional deep Q networks, namely Quantile Regression Deep Q Network(QR-DQN) and Implicit Quantile Networks
PCL β Path Consistency Learning and More
Discussion on path consistency learning and its derivatives.
SAC β Soft Actor-Critic with Adaptive Temperature
We introduce adaptive temperature to soft actor-critic(SAC).
SAC β Soft Actor-Critic
Discussion on soft actor-critic, a maximum entropy algorithm.
Rainbow
Discussion on Rainbow, an integration of multiple improvements on DQN.
c51 β Distributional Deep Q Network
Discussion on the distributional deep Q network(a.k.a. c51), an improvement to deep Q network which replaces action-value Q with the value distribution to take on the stochastic nature of the environment.
PER β Prioritized Experience Replay
Discussion on prioritized experience replay, an improvement to the uniform experience replay used in deep Q network.
PG β Stochastic & Deterministic Policy Gradient
Discussion on policy gradient methods and its derivatives
DQN β Deep Q Network
Discussion on Deep Q network(DQN), a successful algorithm works in discrete-action environments
Mathematics
GAIL β Generative Adversarial Imitation Learning
A concise theoretical analysis of GAIL
V-trace
Theoretical analysis of the V-trace target.
Retrace(π)
A theoretical analysis of the Retrace(π) algorithm.
Spectral Norm
Discussion on Spectral norm and its usage in deep learning
Math
We summarize some mathematical concepts used in deep reinforcement learning
From 1st Wasserstein to Kantorovich-Rubinstein Duality
An introduction to the dual of the 1st Wasserstein distance.
Duality in Linear Programm
An introduction to dual linear programs
Exponential Families
Discussion on Exponential Famlies
SVI β Soft Value Iteration
We address the optimism problem of the probabilistic graphical model introduced in the previous post via variational inference.
PGM β Probabilistic Graphic Model
Discussion on statistic inference in a temporal probabilistic graphical model.
SL β Statistic Learning: A Connection to Neural Networks
We expand the topic of latent variable models in a sense that the latent variables model the underlying structure of the observed data, whereby the model is able to do statistical inference over these latent variables. Then we will build a connnection between statistic learning and neural networks.
EM β Expectation-Maximization Algorithm
Discussion on the Expectation-Maximization(EM) algorithm, and its application to GMMs
SCG β Stochastic Computational Graphs
Discussion on stochastic computational graphs, a type of directed asyclic computational graph that include both deterministic functions and conditional probability distrbutions.
TRPO, PPO
Discussion on two policy-based algorithms which restrict the step size to help avoid big steps: Trust Region Policy Optimization(TRPO) and Proximal Policy Optimization(PPO).
CG β Conjugate Gradient Method
Discussion on the conjugate gradient method in chaos :-)
PCA and Whitening
Discussion on dimensionality reduction technique PCA, and its derivatives whitening and ZCA whitening
SVM β Support Vector Machines
An introduction to support vector machines
Exploration in RL
Go-Explore
Discussion on Go-Explore, a family of algorithms designed for hard-exploration games
DTSIL β Diverse Trajectory-conditioned Self-Imitation Learning
Discussion on Diverse Trajectory-conditioned Self-Imitation Learning,
Backward β Learning from a Single Demonstration
Discussion on a curriculum learning algorithm that gradually learns a policy gradient algorithm on Montezumaβs Revenge
Behavior Priors for Kl regularized Reinforcement Learning
Discussion on behavior priors for KL regularized reinforcement learning
Agent57
Discussion on an agent, called Agent57, that outperforms the standard human benchmark on all Atari games.
NGU β Never Give Up
Discussion on the Never-Give-Up(NGV) agent that achieves the state-of-the-art performance in hard exploration games in Atari without any prior knowledge while maintraining a very high score across the remaining games.
EC β Episodic Curiosity
Discussion on an exploration method based on episodic memory.
Ape-X DQfD
Discussion on several enhancements on Ape-X DQN.
EMI β Exploration with Mutual Information
Discussion on a novel exploration method based on representation learning
SAC-X β Scheduled Auxiliary Control
Discussion on a new learning paradigm in RL that resorts to auxiliary policies to efficiently explore the environment.
ICM, RND
Discussion on two exploration methods based on curiosity, namely Intrinsic Curiosity Module (ICM) and Random Network Distillation(RND)
Some Exploration Algorithms: EX2, LSH, VIME etc.
Discussion on several exploration algorithms, including count-based methods, Thompson sampling, and information gain exploration.
DIAYN β Diversity Is All You Need
Discussion on an unsupervised learning method for learning useful skills without a reward function.
Basic Policies in Reinforcement Learning
We talk in detail about some wildly used policy in reinforcement learning, including epsilon-greedy policy, stochastic policy with temperature, upper confidence bound(UCB), and gradient bandit algorithm
Multi-Agent RL
AlphaStar
Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II
OpenAI Five
Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2
FTW β For The Win
Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.
MuZero
Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games
AlphaZero
Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go
MAPPO
Discussion on Multi-Agent PPO, which includes a few tricks when applying PPO to multi-agent environments
QMIX and Some Tricks
Discussion on QMIX and some tricks on QMIX.
NCC β Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning
Discussion on NCC, a cooperative MARL method that takes into account neighborhood cognitive consistency.
RODE β Learning Roles to Decompose Multi-Agent Tasks
Discussion on RODE, a hierarchical MARL method that decompose the action space into role action subspaces according to their effects on the environment.
MARL β A Survey and Critique
We present an overview of multi-agent reinforcement learning
Hide and Seek
Discussion on an agent developed by OpenAI et al. that exhibits several emergent strategies in hide-and-seek environment.
SchedNet β Schedule Network
Discussion on a multi-agent reinforcement learning algorithm that schedules communication between cooperative agents.
PR2 β Probabilistic Recursive Reasoning
Discussion on a multi-agent reinforcement learning algorithm that recursively reason the opponentsβ behavior.
MADDPG β Multi-Agent-Deep deterministic Policy Gradient
Discussion on a multi-agent reinforcement learning algorithm that follows the framework of centralized training with decentralized execution.
Policy-Gradient RL
IDAAC β Invariant Decoupled Advantage Actor-Critic
Discussion on IDAAC, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.
PPG β Phasic Policy Gradient
Discussion on phasic policy gradient, which implements two disjoint networks for the policy and value function and optimizes them in two phases.
The Mirage of Action-Dependent Baselines
Analysis on action-dependent baselines
Self-Tuning Reinforcement Learning
A self-tuning reinforcement learning algorithm for IMPALA.
A Unified View of KL-Regularized RL
We present a unified view of policy gradient and soft Q-learning.
P3O β Policy-on Policy-off Policy Optimization
Discussion on P3O, an policy gradient method that utilizes both on-policy and off-policy data.
What Matters In On-Policy Reinforcement Learning?
Discussion on several design decisions on on-policy reinforcement learning
MPO β Maximum a Posteriori Policy Optimization
Discussion on maximum a posteriori policy optimization, a KL-regularized reinforcement learning method.
TPPO β Truly PPO
We investigate the behavior of PPO and introduce new methods that forces the trust region constraint.
HPG β Hindsight Policy Gradients
Discussion on a policy-gradient method with hindsight experience
GAE β Generalized Advantage Estimation
Discussion on a multi-step advantage estimation for online reinforcement learning
TRPO, PPO
Discussion on two policy-based algorithms which restrict the step size to help avoid big steps: Trust Region Policy Optimization(TRPO) and Proximal Policy Optimization(PPO).
PG β Stochastic & Deterministic Policy Gradient
Discussion on policy gradient methods and its derivatives
Model-Based RL
MuZero
Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games
AlphaZero
Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go
DreamerV2
Discussion on DreamerV2, a model-based algorithm reaching promising results on Atari games
Dreamer
Discussion on a model-based reinforcement learning agent called Dreamer
PlaNet: Deep Planning Network
Discussion on a model-based reinforcement learning agent called PlaNet
MB-MRL β Model-Based Meta-Reinforcement Learning
Discussion on a model-based meta reinforcement learning algorithm that enables the agent to fast adapt to changes of environment.
MB-MPO β Model-Based Meta-Policy Optimization
Discussion on an algorithm that efficiently learns a robust policy by applying MAML to multiple dynamics model.
TDM β Temporal Difference Models
Discussion on temporal difference models, an algorithm that tries to gain sample efficiency of model-based RL, while achieving asymptotic performance as model-free RL
GPS-iLQR β Guided Policy Search with iLQR
Discussion on iterative Linear Quadratic Regulator with a local linear-Gaussian model
LQR β Linear-Quadratic Regulator
Discussion on Linear Quadratic Regulator its derivatives
MB-MF β Model-Based Model-Free
Discussion on model-based model-free algorithm
Planning and Learning in Model-Based Reinforcement Learning Methods
Discussion on a series of algorithms in model-based reinforcement learning where planning and learning are intermixed.
RL Application
AlphaStar
Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II
OpenAI Five
Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2
FTW β For The Win
Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.
MuZero
Discussion on MuZero, a successor of AlphaZero, that not only masters chess games but also achieves state-of-the-art performance on Atari games
AlphaZero
Discussion on AlphaZero, an agent that achieves super-human performance in chess, shogi and Go
Hide and Seek
Discussion on an agent developed by OpenAI et al. that exhibits several emergent strategies in hide-and-seek environment.
3rd-place solution to MineRL 2019 Competition
Discussion on the 3rd-place solution to MineRL 2019 Competition.
Solving Rubikβs Cube with a Robot Hand
Discussion on an agent, trained on simulation, can solve Rubikβs Cube with a real robot hand.
QWeb
Discussion on how to solve the web navigation problem using DQN.
Regularized RL
TAC β Tsallis Actor Critic
Discussion on Tsallis Actor Critic
M-RL β Munchausen Reinforcement Learning
Discussion on Munchausen Reinforcement Learning, which considers policy in Bellman updates.
Behavior Priors for Kl regularized Reinforcement Learning
Discussion on behavior priors for KL regularized reinforcement learning
A Unified View of KL-Regularized RL
We present a unified view of policy gradient and soft Q-learning.
MPO β Maximum a Posteriori Policy Optimization
Discussion on maximum a posteriori policy optimization, a KL-regularized reinforcement learning method.
MIRL β Mutual Information Reinforcement Learning
Discussion on a new regularization mechanism that leverage an optimal prior to explicitly penalize the mutual information between states and f.
PCL β Path Consistency Learning and More
Discussion on path consistency learning and its derivatives.
SAC β Soft Actor-Critic with Adaptive Temperature
We introduce adaptive temperature to soft actor-critic(SAC).
SAC β Soft Actor-Critic
Discussion on soft actor-critic, a maximum entropy algorithm.
Sequential Model
MERLIN β Memory, RL, and Inference Network
Discussion on a memory architecture that allows us to do temporal relational reasoning.
TransformerXL
Discussion on a successor of Transformer, namely TransformerXL, that can learn from sequences beyond a fixed length
GCN, GLU β Gated Convolutional Network
Discussion on Gated Convolutional Network that applies 1D convolution to sequential data.
PtrNet: Pointer Network
Discussion on Pointer Network.
DNC β Improving Differentiable Neural Computer
Discussion on several improvements on differentiable neural computer.
DNC β Differentiable Neural Computer
Discussion on Differentiable Neural Computer.
NTM β Neural Turing Machines
Discussion on Neural Turing Machines, an architecture able to utilize an external memory.
RMC β Relational Memory Core
Discussion on a recurrent architecture that allows us to do temporal relational reasoning.
Transformer
Discussion on a self-attention architecture named Transformer.
Distributed RL
AlphaStar
Discussion on AlphaStar, the first agent that achieves Grandmaster level in the full game of StarCraft II
OpenAI Five
Discussion on OpenAI Five, an agent that achieves super-human performance in Dota 2
FTW β For The Win
Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.
Ape-X DQfD
Discussion on several enhancements on Ape-X DQN.
SEED β Scalable Efficient Deep-RL
Discussion on a scalable reinforcement learning architecture that speeds up both data collection and learning process.
R2D2: Recurrent Replay Distributed DQN
Discussion on a distributed reinforcement learning architecture that incoporates a recurrent network into Ape-X.
IMPALA
Discussion on a distributed reinforcement learning architecture for policy gradient methods.
Ape-X
Discussion on a distributed reinforcement learning architecture for Q-learning methods.
Hierarchical RL
FTW β For The Win
Discussion on an agent, namely For The Win(FTW), that achieves human-level performance in a popular 3D team-based multiplayer first-person video game.
RODE β Learning Roles to Decompose Multi-Agent Tasks
Discussion on RODE, a hierarchical MARL method that decompose the action space into role action subspaces according to their effects on the environment.
HIDIO β Hierarchical RL by Discovering Intrinsic Options
Discussion on HIDIO, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.
HAC β Learning Multi-Level Hierarchies with Hindsight
A norvel hierarchical reinforcement learning frame work that can efficiently learn multiple levels of policies in parallel.
NORL-HRL β Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning: An improvement to HIRO
HIRO β HIerarchical Reinforcement learning with Off-policy correction
Discussion on a hierarchical reinforcement learning algorithm for goal-directed tasks.
Hierarchical Guidance
Discussion on an algorithmic framework called hierarchical guidance, which leverages hierarchical structure in imitation learning.
DIAYN β Diversity Is All You Need
Discussion on an unsupervised learning method for learning useful skills without a reward function.
Meta-Learning
MB-MRL β Model-Based Meta-Reinforcement Learning
Discussion on a model-based meta reinforcement learning algorithm that enables the agent to fast adapt to changes of environment.
PEARL β Probabilistic Embedding for Actor-critic RL
Discussion on an off-policy meta reinforcement learning algorithm that achieves state-of-the-art performance and sample efficiency.
MB-MPO β Model-Based Meta-Policy Optimization
Discussion on an algorithm that efficiently learns a robust policy by applying MAML to multiple dynamics model.
ProMP β Proximal MetaPolicy Search
We address the credit assignment problem of two forms of MAML with an RL objective and discuss an efficient and stable meta reinforcement learning algorithm.
Adaptive MAML β Applying MAML-RL to nonstationary environments
Discussion on a variant of MAML-RL for solving tasks that change dynamically due to non-stationary of the environment.
MAML++: Improvements on MAML
Discussion on a series of improvements on MAML
MAML β Model-Agnostic Meta-Learning
Discussion on an optimization algorithm for meta-learning named Model-Agnostic Meta-Learning(MAML)
SNAIL β Simple Neural AttentIve meta-Learner
Discussion on a meta-learning architecture named Simple Neural AttentIve meta-Learner(SNAIL).
Representation Learning
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
Discussion on some useful results about how to learn a good representation
NORL-HRL β Near-Optimal Representation Learning for Hierarchical Reinforcement Learning
Near-Optimal Representation Learning for Hierarchical Reinforcement Learning: An improvement to HIRO
GQN β Generative Query Network
Discussion on the generative query network, a brand new unsupervised scene-based generative network.
Contrastive Predicting Coding
Discussion on a sequential representation learning model, contrastive predicting coding.
Beta-VAE and Its Variants
Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term
DIM β Deep INFOMAX
Discussion on Deep INFOMAX, a representation-learning method maximizing mutual information between the input and its representation based on MINE
MINE β Mutual Information Neural Estimation
Discussion on a neural estimator for mutual information, and some of its applications
VAE β Variational Autoencoder
Discussion on variational autoencoders, a kind of generative networks which allows us to alter data in a desired, specific way
Computer Vision
Anti-Aliasing
Discussion on aliasing in modern convolutional neural networks and address it with low-pass filters.
SENet: Squeeze-and Excitation Network
Discussion on Squeeze-and Excitation Network, an architecture that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels.
MobileNet
Discussion on MobileNet families
FiLM β Feature-wise Linear Modulation
Discussion on Feature-wise Linear Modulation
R-CNN β Region-based Methods for Object Detection
Discussion on a series of region-based methods for object detection, and extend to Mask R-CNN for instance segmentation
GANs β Generative Adversarial Networks
Discussion on the generative adversarial network in two ways: one for data generation, and the other for semi-supervised learning. In the end, weβll also demonstrate some techniques that help improve GANs
YOLO β You Only Look Once
Discussion on YOLO, a state-of-the-art real-time object detection algorithm
Generative Network
SAGAN: Techniques in Self-Attention Generative Adversarial Networks
Discussion on several techniques involved in SAGAN, including self-attention, spectral normalization, conditional batch normalization, etc
GQN β Generative Query Network
Discussion on the generative query network, a brand new unsupervised scene-based generative network.
Contrastive Predicting Coding
Discussion on a sequential representation learning model, contrastive predicting coding.
Beta-VAE and Its Variants
Discussion on beta-VAE and its variants, which attempt to learn disentangled representation by heavily penalizing the corresponding correlation term
MINE β Mutual Information Neural Estimation
Discussion on a neural estimator for mutual information, and some of its applications
GANs β Generative Adversarial Networks
Discussion on the generative adversarial network in two ways: one for data generation, and the other for semi-supervised learning. In the end, weβll also demonstrate some techniques that help improve GANs
VAE β Variational Autoencoder
Discussion on variational autoencoders, a kind of generative networks which allows us to alter data in a desired, specific way
Inverse RL
GAIL β Generative Adversarial Imitation Learning
A concise theoretical analysis of GAIL
AIRL β Adversarial Inverse Reinforcement Learning
We introduce a practical GAN-style IRL algorithm named adversarial inverse reinforcement learning(AIRL)
GAN-GCL
We build a connection between maximum entropy inverse reinforcement learning and generative adversarial networks
GCL β Guided Cost Learning
We introduce a maximum entropy inverse reinforcement learning algorithm, named guided policy learning.
Machine Learning
EM β Expectation-Maximization Algorithm
Discussion on the Expectation-Maximization(EM) algorithm, and its application to GMMs
t-SNE
Discussion on t-SNE, an unsupervised learning algorithm commonly used in data visualization.
PCA and Whitening
Discussion on dimensionality reduction technique PCA, and its derivatives whitening and ZCA whitening
SVM β Support Vector Machines
An introduction to support vector machines
Multitask RL
Behavior Priors for Kl regularized Reinforcement Learning
Discussion on behavior priors for KL regularized reinforcement learning
UNREAL β Unsupervised Reinforcement and Auxiliary Learning
Discussion on UNsupervised Reinforcement and Auxiliary Learning(UNREAL), which aims to fully utilize training signals from environments to speed up the learning process and gain better performance.
SAC-X β Scheduled Auxiliary Control
Discussion on a new learning paradigm in RL that resorts to auxiliary policies to efficiently explore the environment.
DIAYN β Diversity Is All You Need
Discussion on an unsupervised learning method for learning useful skills without a reward function.
Tricks
Network Regularization in Policy Optimization
Discussion on the effect of network regularization in policy optimization.
What Matters In On-Policy Reinforcement Learning?
Discussion on several design decisions on on-policy reinforcement learning
The Deadly Triad
We analyze how different components of DQN play a role in emergence of the deadly triad
Diagnosing Bottlenecks in DQN
Discussion on several concerns in deep (Q) learning.
Generalization in RL
IDAAC β Invariant Decoupled Advantage Actor-Critic
Discussion on IDAAC, which identifies and addresses the problem of using a shared representation for learning the policy and the value function.
Generalization in RL
Discussion on several recent works trying to improve generalization in deep reinforcement learning.
Network Randomization
Discussion on network randomization, a techinque improving generalization in reinforcement learning.
Imitation Learning
PWIL β Primal Wasserstein Imitation Learning
Discussion on Primal Wasserstein Imitation Learning.
Ape-X DQfD
Discussion on several enhancements on Ape-X DQN.
Hierarchical Guidance
Discussion on an algorithmic framework called hierarchical guidance, which leverages hierarchical structure in imitation learning.
Memory in RL
MERLIN β Memory, RL, and Inference Network
Discussion on a memory architecture that allows us to do temporal relational reasoning.
EC β Episodic Curiosity
Discussion on an exploration method based on episodic memory.
RMC β Relational Memory Core
Discussion on a recurrent architecture that allows us to do temporal relational reasoning.
Network Layer
Spectral Norm
Discussion on Spectral norm and its usage in deep learning
EvoNorm
Discussion on EvoNorm, a set of uniform normalization-activation layers found by AutoML.
AdaNorm
We analyze layer normalization and discuss its improvement AdaNorm.
Offline RL
3rd-place solution to MineRL 2019 Competition
Discussion on the 3rd-place solution to MineRL 2019 Competition.
REM - Random Ensemble Mixture
Discussion on a RL algorithm that exploit off-policy data.
BCQ β Batch-Constrained Deep Q-Learning
Discussion on a RL algorithm that exploit off-policy data.
Overviews
MARL β A Survey and Critique
We present an overview of multi-agent reinforcement learning
Deep Reinforcement Learning and its Neuroscientific Implications
Notes from Deep Reinforcement Learning and Its Neuroscientific Implications
Challenges of Real-World Reinforcement Learning
Discussion on several challenges of real-world reinforcement learning.
Curriculum Learning
Backward β Learning from a Single Demonstration
Discussion on a curriculum learning algorithm that gradually learns a policy gradient algorithm on Montezumaβs Revenge
Evolutionary Algorithms
Combining EAs with RL
We summarize summarize several recent works that combine evolutionary algorithms with reinforcement learning.
Lifelong Learning
CLEAR β Continual Learning with Experience And Replay
Discussion on continual learning with experience and replay, a simple method preventing catastrophic forgetting and improve stability of learning.
Meta-Gradient RL
Self-Tuning Reinforcement Learning
A self-tuning reinforcement learning algorithm for IMPALA.
Methodology
The Mirage of Action-Dependent Baselines
Analysis on action-dependent baselines
Optimizer
Optimization
Discussion on first-order optimization algorithms in machine learning, which optimize the objective function based on gradients.
Policy-Based RL
V-trace
Theoretical analysis of the V-trace target.
Self-Imitation Learning
SIL - Self-Imitation Learning
Discussion on self-imitation learning, in which the agent exploits the previous transitions that receives better returnas than it expects
Transfer Learning in RL
Solving Rubikβs Cube with a Robot Hand
Discussion on an agent, trained on simulation, can solve Rubikβs Cube with a real robot hand.
Visualization
t-SNE
Discussion on t-SNE, an unsupervised learning algorithm commonly used in data visualization.