BCQ — Batch-Constrained Deep Q-Learning
Discussion on a RL algorithm that exploit off-policy data.
Diagnosing Bottlenecks in DQN
Discussion on several concerns in deep (Q) learning.
SEED — Scalable Efficient Deep-RL
Discussion on a scalable reinforcement learning architecture that speeds up both data collection and learning process.
R2D2: Recurrent Replay Distributed DQN
Discussion on a distributed reinforcement learning architecture that incoporates a recurrent network into Ape-X.
IMPALA
Discussion on a distributed reinforcement learning architecture for policy gradient methods.
Ape-X
Discussion on a distributed reinforcement learning architecture for Q-learning methods.
DNC — Improving Differentiable Neural Computer
Discussion on several improvements on differentiable neural computer.
DNC — Differentiable Neural Computer
Discussion on Differentiable Neural Computer.
NTM — Neural Turing Machines
Discussion on Neural Turing Machines, an architecture able to utilize an external memory.
HPG — Hindsight Policy Gradients
Discussion on a policy-gradient method with hindsight experience