You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 21, 2022. It is now read-only.
I know that Reinforce.jl is not trying to emulate OpenAI gym exactly, but I'm curious behind the reasoning to a couple interface decisions that seem inconsistent with gym's.
First, why doesn't reset!(env) return a state or observation for convenience? From personal experience, when I was using OpenAIGym.jl, reset!(env) was always returning false. This was happening because julia returns the variable on the last line of the function by default, which happened to come from env.done=false. I had to look through the source code to figure out what was happening. Returning a state/observation would be consistent with gym, and would avoid any confusion for new users.
Second, why does step!(env, s, a) return r, s' instead of s',r? This is a minor difference in ordering, but once again, I had an expectation for what step! should return from gym.
The text was updated successfully, but these errors were encountered:
And why does step! take a state? Shouldn't that be stored in the env? In CartPole one of the first things the method does is overwrite the state which was handed in with the state from the environment...
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I know that Reinforce.jl is not trying to emulate OpenAI gym exactly, but I'm curious behind the reasoning to a couple interface decisions that seem inconsistent with gym's.
First, why doesn't
reset!(env)
return a state or observation for convenience? From personal experience, when I was using OpenAIGym.jl,reset!(env)
was always returningfalse
. This was happening because julia returns the variable on the last line of the function by default, which happened to come fromenv.done=false
. I had to look through the source code to figure out what was happening. Returning a state/observation would be consistent with gym, and would avoid any confusion for new users.Second, why does
step!(env, s, a)
returnr, s'
instead ofs',r
? This is a minor difference in ordering, but once again, I had an expectation for whatstep!
should return from gym.The text was updated successfully, but these errors were encountered: