TY - JOUR
T1 - Improved Exploration in Reinforcement Learning Environments with Low-Discrepancy Action Selection
AU - Carden, Stephen W.
AU - Lindborg, Jedidiah O.
AU - Utic, Zheni
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/6
Y1 - 2022/6
N2 - Reinforcement learning (RL) is a subdomain of machine learning concerned with achieving optimal behavior by interacting with an unknown and potentially stochastic environment. The exploration strategy for choosing actions is an important component for enabling the decision agent to discover how to obtain high rewards. If constructed well, it may reduce the learning time of the decision agent. Exploration in discrete problems has been well studied, but there are fewer strategies applicable to continuous dynamics. In this paper, we propose a Low-Discrepancy Action Selection (LDAS) process, a novel exploration strategy for environments with continuous states and actions. This algorithm focuses on prioritizing unknown regions of the state-action space with the intention of finding ideal actions faster than pseudo-random action selection. Results of experimentation with three benchmark environments elucidate the situations in which LDAS is superior and introduce a metric for quantifying the quality of exploration.
AB - Reinforcement learning (RL) is a subdomain of machine learning concerned with achieving optimal behavior by interacting with an unknown and potentially stochastic environment. The exploration strategy for choosing actions is an important component for enabling the decision agent to discover how to obtain high rewards. If constructed well, it may reduce the learning time of the decision agent. Exploration in discrete problems has been well studied, but there are fewer strategies applicable to continuous dynamics. In this paper, we propose a Low-Discrepancy Action Selection (LDAS) process, a novel exploration strategy for environments with continuous states and actions. This algorithm focuses on prioritizing unknown regions of the state-action space with the intention of finding ideal actions faster than pseudo-random action selection. Results of experimentation with three benchmark environments elucidate the situations in which LDAS is superior and introduce a metric for quantifying the quality of exploration.
KW - low-discrepancy sequence
KW - Markov decision process
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85201620576&partnerID=8YFLogxK
U2 - 10.3390/appliedmath2020014
DO - 10.3390/appliedmath2020014
M3 - Article
AN - SCOPUS:85201620576
SN - 2673-9909
VL - 2
SP - 234
EP - 246
JO - AppliedMath
JF - AppliedMath
IS - 2
ER -