[question] Cartpole PPO1 example and alternate policies #35

iandanforth · 2018-09-23T18:32:20Z

From the provided example it appears as if you should be able to swap in different policy implementations for MlpPolicy and have the example code run. This does not appear to be the case, so I suspect I'm misunderstanding something. To use something other than MlpPolicy what should a user know? I haven't read all the docs thoroughly so I apologize if this is clearly spelled out somewhere!

System Info
Describe the characteristic of your environment:

Describe how the library was installed (pip, docker, source, ...)

pip

GPU models and configuration

CPU only

Python version

3.6.4

Tensorflow version

1.8

Additional context
An example traceback of trying to use one of the other policies

/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/input.py:30: RuntimeWarning: overflow encountered in subtract
np.any((ob_space.high - ob_space.low) != 0)):
/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/input.py:33: RuntimeWarning: overflow encountered in subtract
processed_x = ((processed_x - ob_space.low) / (ob_space.high - ob_space.low))
Traceback (most recent call last):
File "agents/ppo.py", line 9, in
model = PPO1(CnnLnLstmPolicy, env, verbose=1)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 77, in init
self.setup_model()
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/ppo1/pposgd_simple.py", line 88, in setup_model
None, reuse=False)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/policies.py", line 349, in init
layer_norm=True, feature_extraction="cnn", **_kwargs)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/policies.py", line 192, in init
extracted_features = cnn_extractor(self.processed_x, **kwargs)
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/common/policies.py", line 21, in nature_cnn
layer_1 = activ(conv(scaled_images, 'c1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/stable_baselines/a2c/utils.py", line 122, in conv
n_input = input_tensor.get_shape()[channel_ax].value
File "/Users/iandanforth/.pyenv/versions/3.6.4/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 612, in getitem
return self._dims[key]
IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

araffin · 2018-09-23T18:41:10Z

Hello,
I think the doc is a bit misleading.
In your case, I assume that you tried to use a CNN policy on something that was not an image. This will fail because it expects that the input observation is a tensor of dimension 3 (width, height, color channels).
However, for Cartpole, the input is only a vector, so you can use MlpPolicy and variants (e.g. MlpLstmPolicy) but not a CNNPolicy.

Did I answer you question?

@hill-a we should maybe update the doc to prevent those type of errors, no?

hill-a · 2018-09-23T22:12:45Z

Hey,

Probably best to add an update to the documentation, and to add a check to the models to make sure the input for the policy

brendenpetersen · 2018-10-02T21:35:59Z

@araffin , I'm not OP, but the PPO1 (and TRPO) example CartPole script doesn't work for me when using any recurrent policies, e.g. MlpLstmPolicy. I get an error when making the lstm object, though it differs slightly based on TensorFlow 1.8 vs 1.12. It seems like it has to do with the n_env, n_step, and n_batch arguments, which (for example) A2C handles in its setup_model function but PPO1/TRPO don't seem to do anything with.

Traceback (most recent call last):
  File "ppo1_test.py", line 13, in <module>
    model = PPO1(MlpLstmPolicy, env, verbose=1)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/ppo1/pposgd_simple.py", line 77, in __init__
    self.setup_model()
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/ppo1/pposgd_simple.py", line 88, in setup_model
    None, reuse=False)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/policies.py", line 392, in __init__
    layer_norm=False, feature_extraction="mlp", **_kwargs)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/common/policies.py", line 206, in __init__
    layer_norm=layer_norm)
  File "/Users/petersen33/repositories/stable-baselines/stable_baselines/a2c/utils.py", line 199, in lstm
    weight_x = tf.get_variable("wx", [n_input, n_hidden * 4], initializer=ortho_init(init_scale))
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1317, in get_variable
    constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1079, in get_variable
    constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 425, in get_variable
    constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
    use_resource=use_resource, constraint=constraint)
  File "/Users/petersen33/repositories/venv_sb/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 754, in _get_single_variable
    "but instead was %s." % (name, shape))
ValueError: Shape of a new variable (model/lstm1/wx) must be fully defined, but instead was (?, 1024).

araffin · 2018-10-02T22:04:01Z

@hill-a looks like a bug... Can you check that?

araffin · 2018-10-14T19:37:42Z

Closing this issue (which is now a duplicated of #60 )

araffin added the question Further information is requested label Sep 23, 2018

araffin mentioned this issue Sep 29, 2018

DQN fixes #39

Merged

araffin added the bug Something isn't working label Oct 7, 2018

brendenpetersen mentioned this issue Oct 11, 2018

Masks for LstmPolicy in PPO1 #60

Closed

araffin removed the bug Something isn't working label Oct 14, 2018

araffin closed this as completed Oct 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Cartpole PPO1 example and alternate policies #35

[question] Cartpole PPO1 example and alternate policies #35

iandanforth commented Sep 23, 2018 •

edited

Loading

araffin commented Sep 23, 2018 •

edited

Loading

hill-a commented Sep 23, 2018

brendenpetersen commented Oct 2, 2018

araffin commented Oct 2, 2018

araffin commented Oct 14, 2018

[question] Cartpole PPO1 example and alternate policies #35

[question] Cartpole PPO1 example and alternate policies #35

Comments

iandanforth commented Sep 23, 2018 • edited Loading

araffin commented Sep 23, 2018 • edited Loading

hill-a commented Sep 23, 2018

brendenpetersen commented Oct 2, 2018

araffin commented Oct 2, 2018

araffin commented Oct 14, 2018

iandanforth commented Sep 23, 2018 •

edited

Loading

araffin commented Sep 23, 2018 •

edited

Loading