Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior

Research output: Contribution to conferencePresentation

Abstract

<p> Presentation given at the 100th Meeting of the Southeastern Section of the Mathematical Association of America.</p><p> <a href="https://maasoutheastern.org/wp-content/uploads/2021/03/MAA-SE-2021-Abstracts.pdf" target="_self"> Abstract </a></p><p> In many statistical and machine learning applications, without-replacement sampling is considered superior to with-replacement sampling. In some cases, this has been proven, and in others the heuristic is so intuitively attractive that it is taken for granted. In reinforcement learning, many count-based exploration strategies are justified by reliance on the aforementioned heuristic. This paper will detail the non-intuitive discovery that when measuring the goodness of an exploration strategy by the stochastic shortest path to a goal state, there is a class of processes for which an action selection strategy based on without-replacement sampling of actions can be worse than with replacement sampling. Specifically, the expected time until a specified goal state is first reached can be provably larger under without-replacement sampling. Numerical experiments describe the frequency and severity of this inferiority</p>
Original languageAmerican English
StatePublished - Mar 1 2021
EventSoutheast Section of the Mathematical Association of America Annual Meeting - Virtual
Duration: Mar 6 2021Mar 13 2021
Conference number: 100
https://maasoutheastern.org/2021-conference/ (Link to conference site)

Conference

ConferenceSoutheast Section of the Mathematical Association of America Annual Meeting
Abbreviated titleMAASE
Period03/6/2103/13/21
Internet address

DC Disciplines

  • Mathematics

Fingerprint

Dive into the research topics of 'Exploration Using Without-Replacement Sampling of Actions is Sometimes Inferior'. Together they form a unique fingerprint.

Cite this