-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bandits #35
base: master
Are you sure you want to change the base?
Bandits #35
Conversation
Codecov Report
@@ Coverage Diff @@
## master #35 +/- ##
===========================================
- Coverage 96.55% 80.41% -16.14%
===========================================
Files 3 3
Lines 319 383 +64
===========================================
Hits 308 308
- Misses 11 75 +64
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, nice code with a few suggested changes for clarity. Can you also write some test cases please?
#' @param T number of rounds / draws | ||
#' | ||
#' @return The new \code{MultiArmedBandit} (invisibly) | ||
initialize = function(K, T, N = NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to have the initialize
method raise an error saying that it's abstract and then use a private method to run the code currently in initialize
that gets called by the child?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
#' | ||
#' | ||
#' @return The updated code{MultiArmedBandit} (invisibly) | ||
update = function(x, k) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If k
comes from the previous value
, would it be cleaner to remember that internally so that k
can be NULL
be default but specified if the user ignore the streamer's suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this suggestion.
#' | ||
#' | ||
#' @return list with summary of the state of the Bandit. | ||
value = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we said that value
would just return the next arm to pull. summary
can be used for the full state.
#' @return The new \code{ExploreFristNBandit} (invisibly) | ||
initialize = function(K, T, N = NULL) { | ||
super$initialize(K, T) | ||
private$state <- "exploration" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a Boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of state = "exploration"
or state = "exploitation"
you are suggesting a variable exploration
taking values TRUE
or FALSE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. Less potential for typos.
} | ||
if (T < N * K) { | ||
stop(sprintf( | ||
"More draws are required for the values of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error could be clearer. Something like "More exploration draws are specified than the total number of rounds"
super$update(x, k) | ||
if (runif(1) < private$epsilon[private$n_observed]) { | ||
private$state <- "exploration" | ||
k <- floor(runif(1, 1, K + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use sample
#' | ||
#' @param K the number of arms | ||
#' @param T number of rounds / draws | ||
#' @param epsilon vector of length T specifying the exploration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, epsilon greedy has a scalar epsilon held constant over the game (Sutton, R. S. & Barto, A. G. 1998 Reinforcement learning). I think something this general doesn't really need to exist. The closest thing is epsilon decreasing where epsilons form a geometric series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I relied on this: https://arxiv.org/pdf/1904.07272.pdf, see Algorithm 1.2, epsilon can change in time. Perhaps we can leave it as it is and just add the possibility for epsilon to be a constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough about the source but for the purpose of online algorithms, specifying the whole epsilon doesn't make much sense. I think sticking with a fixed epsilon of a geometric series is more in-line with the packages vision.
Related to this, does it make sense that these games have finite time horizons? Shouldn't the point be that you can steam indefinitely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree + was also not convinced by the fine time T appearing in those algorithms. I will adjust them for infinite sampling.
#' | ||
#' @export | ||
#' @format An \code{\link{R6Class}} generator object | ||
ExploreFristNBandit <- R6::R6Class( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo + this is typically called epsilon-first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I relied on the Introduction to Multi-Armed Bandits by Aleksandrs Slivkin, where they call it an Explore-First algorithm with a parameter N.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, perhaps just ExploreFirst
then. I think the N
is evident from the init function.
Thanks for the review. It was only a draft pull request to see if I am heading in the right direction. The tests will follow soon. |
This refers to issue #26.
I implemented two simple bandits:
The idea is that all bandits will inherit the basic properties from the
MultiArmedBandit
class. This class itself will never be used on its own, so it should be a sort of "abstract" class which I don't know how to go about in R. I would appreciate your review and alternative ideas.