Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior

Stephen W. Carden, S. Dalton Walker

Research output: Contribution to journalArticlepeer-review

2 Scopus citations
3 Downloads (Pure)

Abstract

In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with-replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority.

Original languageAmerican English
JournalMachine Learning & Knowledge Extracting
Volume1
DOIs
StatePublished - May 24 2019

Disciplines

  • Mathematics

Keywords

  • Markov decision processes
  • count-based exploration
  • reinforcement learning
  • stochastic shortest path
  • without-replacement sampling

Fingerprint

Dive into the research topics of 'Exploration Using Without-Replacement Sampling of Actions Is Sometimes Inferior'. Together they form a unique fingerprint.

Cite this