Improved Exploration in Reinforcement Learning Environments with Low-Discrepancy Action Selection

Stephen W. Carden, Jedidiah O. Lindborg, Zheni Utic

Research output: Contribution to journalArticlepeer-review

Abstract

Reinforcement learning (RL) is a subdomain of machine learning concerned with achieving optimal behavior by interacting with an unknown and potentially stochastic environment. The exploration strategy for choosing actions is an important component for enabling the decision agent to discover how to obtain high rewards. If constructed well, it may reduce the learning time of the decision agent. Exploration in discrete problems has been well studied, but there are fewer strategies applicable to continuous dynamics. In this paper, we propose a Low-Discrepancy Action Selection (LDAS) process, a novel exploration strategy for environments with continuous states and actions. This algorithm focuses on prioritizing unknown regions of the state-action space with the intention of finding ideal actions faster than pseudo-random action selection. Results of experimentation with three benchmark environments elucidate the situations in which LDAS is superior and introduce a metric for quantifying the quality of exploration.

Original languageEnglish
Pages (from-to)234-246
Number of pages13
JournalAppliedMath
Volume2
Issue number2
DOIs
StatePublished - Jun 2022

Keywords

  • low-discrepancy sequence
  • Markov decision process
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Improved Exploration in Reinforcement Learning Environments with Low-Discrepancy Action Selection'. Together they form a unique fingerprint.

Cite this