Sherwin Chen
by Sherwin Chen
1 min read

Categories

Tags

Introduction

We briefly summarize several recent works that combine evolutionary algorithms(EAs) with reinforcement learning(RL).

Evolutionary Reinforcement Learning

Evolutionary Reinforcement Learning(ERL) runs a population of EA actors concurrent with RL actors to enrich the replay buffer. Periodically, the RL’s actor network is copied into the evolving population of actors, replacing EA actors with the lowest fitness.

This method is effective for sparse reward problem as EAs neglect intermediate reward and bias to episode-wise return.

Collaborative Evolutionary Reinforcement Learning

Collaborative Evolutionary Reinforcement Learning(CERL) extends ERL by using multiple learners, each with its own discount factor \(\gamma\). Furthermore, each learner is associated with several workers to run rollouts; the number of workers associated to each learner depends on its corresponding UBC score:

\[\begin{align} U_i=v_u+c*\sqrt{\log(\sum_{i=1}^by_i)\over y_i} \end{align}\]

where \(v_i\) is the average received from the \(i\)’th leaner’s rollouts, and \(y_i\) is the number of rollouts a learner has run.

For The Win

The For The Win(FTW) agent uses a two-tier optimization process with population-based training in the outer loop to evolve hyperparameters and reinforcement learning in the inner loop to train neural networks.

POET

POET applies evolutionary strategy(ES) to both environments and agents. We summarize it as follows

  1. POET trains a set of agent-environment pairs simultaneously via ES.
  2. After agents achieve satisfied performance on their environments, environments are mutated randomly.
  3. To ensure evolved environments not to be too hard or too simple, only those environments that meets the minimal criterion are kept(e.g. iff a fraction of the agent in the can achieve promising performance).
  4. Evolved environment are further ranked based on their novelty which is computed based on the Euclidean distance from the \(k\)-nearest neighbors.
  5. From time to time, agents test their performance on their peers’ environments with limited evolutionary steps. Replacement happens if one can surpass its peer’s performance.

References

Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer. Collaborative Evolutionary Reinforcement Learning.

Shauharda Khadka, Kagan Tumer. Evolution-Guided Policy Gradient in Reinforcement Learning

Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castañeda, Charles Beattie, et al. 2019. “Human-Level Performance in 3D Multiplayer Games with Population-Based Reinforcement Learning.” Science 364 (6443): 859–65. https://doi.org/10.1126/science.aau6249.

Rui Wang Joel Lehman Jeff Clune Kenneth O. Stanley. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions