-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinforcement Learning Interface #126
Comments
I really think we should work with Reinforce.jl rather than making anything parallel (assuming that their interface makes sense). It would be a huge tragedy if we ended up with something slightly different and things that could be compatible are not. (We may need to register all of our solvers after all if we want to engage with the community better) I am not sure I completely understand their interface. Is the state part of the environment or not? This seems very important to me. |
The biggest issue in working with Reinforce.jl is that it's a pretty hefty package with a bunch of deps. The package is not just an interface, but also implements algorithms, has support tools, etc. It also has a number of abstract types that would conflict with types in POMDPs.jl like It also seems like Reinforce.jl currently implements only the cross-entropy method, and it's not clear to me when and if deep RL algs will be implemented, which I think should be the main focus of our efforts with RL. In the Reinfroce.jl interface, the state doesn't have to be a part of the environment. It seems that the |
Guys, I can't believe you've completely forgotten about me. You can look at my two half-baked attempts at making a reinforcement learning package for inspiration: IIRC it's not too crazy to implement RL into the POMDPs.jl framework. You just have to add a bunch of annoying parameters relating to simulation into your RL type. |
@cho3 , thanks! We haven't forgotten! I don't think we would need to make any changes to POMDPs.jl to use our problems with Reinforce.jl - we would just need to write some lightweight wrappers, and I think those wrappers could conform to Reinforce.jl type heirarchy. I don't think any code would need to be changed in either of the core packages (I may be missing something - let me know if I am). If we write our own separate package, then the work will be duplicated, and, in the long run, people will have to choose between them or spend a lot of time writing glue code. If we combine forces, even though it's more difficult now, I think it will be a better experience ultimately. If we do write our own I think we should do it like gym with the state in the environment - IMO it will be too hard to communicate expectations to users if the state can be in both the environment and a separate variable. GenerativeModels covers the case where the environment and state are separated. Also, I think only one-way compatibility is feasible, specifically, any RL solver should be able to solve a generative model problem or a transition distribution problem, or an RL problem, but MCTS should only be able to solve generative model problems or transition distribution problems, but not problems specified only using the RL interface. |
OK, ideally what there should be is one big package for all types of sequential decision problems with three tiers of problem specification, full distribution modeling, generative model, and gym-style environments. |
I think you might be right - Reinforce.jl might be so far from what we are envisioning that we should just create our own package - we may want to register it to METADATA separately from POMDPs.jl though |
@etotheipluspi, is the goal of this to
If it is 1, I think GenerativeModels should work fine; |
Or is the goal to do in-place updates on the environment rather than creating a new state every time? |
@zsunberg I think we should aim for something along the lines of 1 and 2. That's how we aim The problem with only using GenerativeModels.jl for 1 is that it is so different from typical RL interface (i.e. gym) that it might be confusing. I would be happy if we added more functions to |
Ok, cool. When you say "it might be confusing", do you mean for a solver writer or a problem writer or someone else? Or do you just think it will not be adopted in general because it is different from the mainstream? Can you outline the differences that seem important to you? Is it just a vocabulary problem (i.e. Would it be possible to code up a simple example solver to illustrate why the current GenerativeModels interface would be confusing? (Or a problem if that is what you think is confusing). Sorry for being pedantic about this - it is just really hard to get right and I want to get it right. Introducing more functions could potentially make universal usability more of a challenge. Btw I am completely OK with revising GenerativeModels or completely nuking it if we come up with something better. |
We could have a "Where is the step function?" FAQ that points to generative models and explains their relationship. |
We should concretely define the use case. Is this a package that allows POMDPs solvers to be used with more problems (e.g. environments that are coded to include the state in them), or one that allows POMDPs problems to be solved by more solvers (yes the latter is definitely true), or both? |
It will be easier from the problem writer's standpoint, in particular those who are used to the interfaces like in gym.
I think the simplest way to look at this is that gym is now the de facto benchmark suite in RL. Not following their interface has a number of potential drawbacks like turning potential users away, and making it more difficult to wrap problems and solvers that do rely on this interface.
This depends entirely on the user. One thing that could trip users up is that there are six generate functions.
I don't think adding the step interface will make usability a challenge. Do you mean this in the context of the interfaces that a problem writer might need to choose? Currently there are two - the full blown POMDPs.jl interface and the GenerativeModels interface, the step interface would be a third one. I think the step interface can be advertised as something for people with RL interests only. also @cho3 we definitely haven't forgotten. Your implementations looks really useful. I think they'll serve as a great starting point. Looks like we should be able to |
Ok great, so this is mostly about fitting more problems into our solvers like POMCP. It seems like, if the user implements the gym interface, plus an extra Another thing to note is that, in my experience, when students understood the GenerativeModels interface, they reacted with "Oh, that's exactly what I need!", so maybe we just need to focus on getting people to understand and use that interface.
Yeah, definitely! Can you outline which distinctive things about gym you think we need to keep, and which ones we can/should change slightly? Is the lack of environment/state distinction important? Is using the exact same vocabulary important?
What do you think about JuliaPOMDP/GenerativeModels.jl#11? Would that be easier to understand or harder? (this question is sort of orthogonal to this current issue)
I mean that if we introduce something parallel to generative models, then it might be difficult to make solvers (e.g. MCTS/POMCP) work on both interfaces, and if some solvers work on some problems and not others, the ecosystem will be frustrating to use. |
I guess let's write it up and see how it goes. I think we just need to answer the question of whether the state is part of the environment first (i think it should be). |
The more I think about this, the more I like it. Are you planning on working on it, @etotheipluspi? |
Yep, putting something together. |
This is implemented in https://github.com/sisl/DeepRL.jl |
There is nothing stopping us from wrapping our POMDP models to work with RL algorithms. This would allow us to include deep reinforcement learning algorithms in our solver suite, and all the good things that come with that.
I think this interface should look like a simplified version of GenerativeModels.jl. I like what has been done in Reinforce.jl. They even mention JuliaPOMDP in one of their issues on what their API should look like. Their API uses an
AbstractEnvironment
type which implements the following methods:reset!(env)
actions(env, s) --> A
step!(env, s, a) --> r, s′
finished(env, s′)
and optional overrides:
state(env) --> s
reward(env) --> r
I like this interface. We can include it either in
GenerativeModels.jl
or in a new package that would live in the JuliaPOMDP package director (maybeRL.jl
or something of the sort).What are everyone's thoughts on the interface itself, and how it should be included into POMDPs.jl?
The text was updated successfully, but these errors were encountered: