Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandits #35

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Bandits #35

wants to merge 2 commits into from

Conversation

kasia-kobalczyk
Copy link
Collaborator

This refers to issue #26.

I implemented two simple bandits:

  • Explore First N strategy
  • Epsilon-greedy strategy

The idea is that all bandits will inherit the basic properties from the MultiArmedBandit class. This class itself will never be used on its own, so it should be a sort of "abstract" class which I don't know how to go about in R. I would appreciate your review and alternative ideas.

@codecov-commenter
Copy link

Codecov Report

Merging #35 (d0ee252) into master (f16c1f1) will decrease coverage by 16.13%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #35       +/-   ##
===========================================
- Coverage   96.55%   80.41%   -16.14%     
===========================================
  Files           3        3               
  Lines         319      383       +64     
===========================================
  Hits          308      308               
- Misses         11       75       +64     
Impacted Files Coverage Δ
R/statistics.R 72.94% <0.00%> (-24.45%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f16c1f1...d0ee252. Read the comment docs.

Copy link
Owner

@THargreaves THargreaves left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, nice code with a few suggested changes for clarity. Can you also write some test cases please?

#' @param T number of rounds / draws
#'
#' @return The new \code{MultiArmedBandit} (invisibly)
initialize = function(K, T, N = NULL) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to have the initialize method raise an error saying that it's abstract and then use a private method to run the code currently in initialize that gets called by the child?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

#'
#'
#' @return The updated code{MultiArmedBandit} (invisibly)
update = function(x, k) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If k comes from the previous value, would it be cleaner to remember that internally so that k can be NULL be default but specified if the user ignore the streamer's suggestion?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this suggestion.

#'
#'
#' @return list with summary of the state of the Bandit.
value = function() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we said that value would just return the next arm to pull. summary can be used for the full state.

#' @return The new \code{ExploreFristNBandit} (invisibly)
initialize = function(K, T, N = NULL) {
super$initialize(K, T)
private$state <- "exploration"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a Boolean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of state = "exploration" or state = "exploitation" you are suggesting a variable exploration taking values TRUE or FALSE ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Less potential for typos.

}
if (T < N * K) {
stop(sprintf(
"More draws are required for the values of
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error could be clearer. Something like "More exploration draws are specified than the total number of rounds"

super$update(x, k)
if (runif(1) < private$epsilon[private$n_observed]) {
private$state <- "exploration"
k <- floor(runif(1, 1, K + 1))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use sample

#'
#' @param K the number of arms
#' @param T number of rounds / draws
#' @param epsilon vector of length T specifying the exploration
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, epsilon greedy has a scalar epsilon held constant over the game (Sutton, R. S. & Barto, A. G. 1998 Reinforcement learning). I think something this general doesn't really need to exist. The closest thing is epsilon decreasing where epsilons form a geometric series.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I relied on this: https://arxiv.org/pdf/1904.07272.pdf, see Algorithm 1.2, epsilon can change in time. Perhaps we can leave it as it is and just add the possibility for epsilon to be a constant value.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough about the source but for the purpose of online algorithms, specifying the whole epsilon doesn't make much sense. I think sticking with a fixed epsilon of a geometric series is more in-line with the packages vision.

Related to this, does it make sense that these games have finite time horizons? Shouldn't the point be that you can steam indefinitely?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree + was also not convinced by the fine time T appearing in those algorithms. I will adjust them for infinite sampling.

#'
#' @export
#' @format An \code{\link{R6Class}} generator object
ExploreFristNBandit <- R6::R6Class(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo + this is typically called epsilon-first

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I relied on the Introduction to Multi-Armed Bandits by Aleksandrs Slivkin, where they call it an Explore-First algorithm with a parameter N.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, perhaps just ExploreFirst then. I think the N is evident from the init function.

@kasia-kobalczyk
Copy link
Collaborator Author

Thanks for the review. It was only a draft pull request to see if I am heading in the right direction. The tests will follow soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants