Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Cartpole PPO1 example and alternate policies #35

Closed
iandanforth opened this issue Sep 23, 2018 · 5 comments
Closed

[question] Cartpole PPO1 example and alternate policies #35

iandanforth opened this issue Sep 23, 2018 · 5 comments
Labels
question Further information is requested

Comments

@iandanforth
Copy link

iandanforth commented Sep 23, 2018

From the provided example it appears as if you should be able to swap in different policy implementations for MlpPolicy and have the example code run. This does not appear to be the case, so I suspect I'm misunderstanding something. To use something other than MlpPolicy what should a user know? I haven't read all the docs thoroughly so I apologize if this is clearly spelled out somewhere!

System Info
Describe the characteristic of your environment:

  • Describe how the library was installed (pip, docker, source, ...)

pip

  • GPU models and configuration

CPU only

  • Python version

3.6.4

  • Tensorflow version

1.8

Additional context
An example traceback of trying to use one of the other policies

/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/input.py:30: RuntimeWarning: overflow encountered in subtract
np.any((ob_space.high - ob_space.low) != 0)):
/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/input.py:33: RuntimeWarning: overflow encountered in subtract
processed_x = ((processed_x - ob_space.low) / (ob_space.high - ob_space.low))
Traceback (most recent call last):
File "agents/ppo.py", line 9, in
model = PPO1(CnnLnLstmPolicy, env, verbose=1)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 77, in init
self.setup_model()
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 88, in setup_model
None, reuse=False)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/policies.py", line 349, in init
layer_norm=True, feature_extraction="cnn", **_kwargs)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/policies.py", line 192, in init
extracted_features = cnn_extractor(self.processed_x, **kwargs)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/policies.py", line 21, in nature_cnn
layer_1 = activ(conv(scaled_images, 'c1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/a2c/utils.py", line 122, in conv
n_input = input_tensor.get_shape()[channel_ax].value
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 612, in getitem
return self._dims[key]
IndexError: list index out of range

@araffin araffin added the question Further information is requested label Sep 23, 2018
@araffin
Copy link
Collaborator

araffin commented Sep 23, 2018

Hello,
I think the doc is a bit misleading.
In your case, I assume that you tried to use a CNN policy on something that was not an image. This will fail because it expects that the input observation is a tensor of dimension 3 (width, height, color channels).
However, for Cartpole, the input is only a vector, so you can use MlpPolicy and variants (e.g. MlpLstmPolicy) but not a CNNPolicy.

Did I answer you question?

@hill-a we should maybe update the doc to prevent those type of errors, no?

@hill-a
Copy link
Owner

hill-a commented Sep 23, 2018

Hey,

Probably best to add an update to the documentation, and to add a check to the models to make sure the input for the policy

@araffin araffin mentioned this issue Sep 29, 2018
@brendenpetersen
Copy link

@araffin , I'm not OP, but the PPO1 (and TRPO) example CartPole script doesn't work for me when using any recurrent policies, e.g. MlpLstmPolicy. I get an error when making the lstm object, though it differs slightly based on TensorFlow 1.8 vs 1.12. It seems like it has to do with the n_env, n_step, and n_batch arguments, which (for example) A2C handles in its setup_model function but PPO1/TRPO don't seem to do anything with.

Traceback (most recent call last):
  File "ppo1_test.py", line 13, in <module>
    model = PPO1(MlpLstmPolicy, env, verbose=1)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/ppo1/pposgd_simple.py", line 77, in __init__
    self.setup_model()
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/ppo1/pposgd_simple.py", line 88, in setup_model
    None, reuse=False)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/policies.py", line 392, in __init__
    layer_norm=False, feature_extraction="mlp", **_kwargs)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/policies.py", line 206, in __init__
    layer_norm=layer_norm)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/a2c/utils.py", line 199, in lstm
    weight_x = tf.get_variable("wx", [n_input, n_hidden * 4], initializer=ortho_init(init_scale))
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
    constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
    constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
    constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 754, in _get_single_variable
    "but instead was %s." % (name, shape))
ValueError: Shape of a new variable (model/lstm1/wx) must be fully defined, but instead was (?, 1024).

@araffin
Copy link
Collaborator

araffin commented Oct 2, 2018

@hill-a looks like a bug... Can you check that?

@araffin araffin added the bug Something isn't working label Oct 7, 2018
@araffin araffin removed the bug Something isn't working label Oct 14, 2018
@araffin
Copy link
Collaborator

araffin commented Oct 14, 2018

Closing this issue (which is now a duplicated of #60 )

@araffin araffin closed this as completed Oct 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants