Abstract
Reinforcement learning (RL) is a subdomain of machine learning concerned with achieving optimal behavior by interacting with an unknown and potentially stochastic environment. The exploration strategy for choosing actions is an important component for enabling the decision agent to discover how to obtain high rewards. If constructed well, it may reduce the learning time of the decision agent. Exploration in discrete problems has been well studied, but there are fewer strategies applicable to continuous dynamics. In this paper, we propose a Low-Discrepancy Action Selection (LDAS) process, a novel exploration strategy for environments with continuous states and actions. This algorithm focuses on prioritizing unknown regions of the state-action space with the intention of finding ideal actions faster than pseudo-random action selection. Results of experimentation with three benchmark environments elucidate the situations in which LDAS is superior and introduce a metric for quantifying the quality of exploration.
Original language | English |
---|---|
Pages (from-to) | 234-246 |
Number of pages | 13 |
Journal | AppliedMath |
Volume | 2 |
Issue number | 2 |
DOIs | |
State | Published - Jun 2022 |
Scopus Subject Areas
- Mathematics (miscellaneous)
- Applied Mathematics
Keywords
- low-discrepancy sequence
- Markov decision process
- reinforcement learning