[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

GIS-PuppetMaster · 2020-04-20T11:18:18Z

from stable_baselines.common.policies import *
from stable_baselines import *
from stable_baselines.common.env_checker import check_env
from stable_baselines.common.vec_env.dummy_vec_env import DummyVecEnv
from TradeEnv import TradeEnv
from Util.Util import *
from Util.Callback import CustomCallback

episode = 5000
EP_LEN = 250 * 3
FILE_TAG = "TRPO"
mode = "train"
n_training_envs = 1


def post_processor(state):
    return log10plus1R(state) / 10


def make_env():
    env = TradeEnv(obs_time_size='60 day', obs_delta_frequency='1 day', sim_delta_time='1 day',
                   start_episode=0, episode_len=EP_LEN, stock_code='000938_XSHE',
                   result_path="E:/运行结果/TRPO/" + FILE_TAG + "/" + mode + "/",
                   stock_data_path='E:\PycharmProjects\DPPO\myDPPO\Data/train/',
                   poundage_rate=1.5e-3, reward_verbose=1, post_processor=post_processor,
                   max_episode_days=EP_LEN)
    env.seed(0)
    check_env(env)
    return env


env = DummyVecEnv([make_env for _ in range(n_training_envs)])
callback = CustomCallback()
model = TRPO(MlpLstmPolicy, env, verbose=1, tensorboard_log="./log/", seed=0)             
model.learn(total_timesteps=episode * EP_LEN, callback=callback)
model.save("./model")

Describe the bug
the observation of my ENV is
self.observation_space = spaces.Box(
low=np.array([[float('-inf') for _ in range(26)]
for _ in range(self.obs_time // self.obs_delta_frequency)]),
high=np.array([[float('inf') for _ in range(26)]
for _ in range(0, self.obs_time // self.obs_delta_frequency)]))
its shape is like (timestep=60, feature=26)

When I run this code, I got error like this:
ValueError: The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of tf.keras.initializers.*' and shape` should be fully defined.
But if I use PPO2, it runs normal

Code example
see above

Traceback (most recent call last): 
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\trpo_mpi\trpo_mpi.py", line 138, in setup_model
    None, reuse=False, **self.policy_kwargs)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\common\policies.py", line 681, in __init__
    layer_norm=False, feature_extraction="mlp", **_kwargs)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\common\policies.py", line 427, in __init__
    layer_norm=layer_norm)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\common\tf_layers.py", line 143, in lstm
    weight_x = tf.get_variable("wx", [n_input, n_hidden * 4], initializer=ortho_init(init_scale))
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1479, in get_variable
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1220, in get_variable
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 547, in get_variable
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 499, in _true_getter
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 890, in _get_single_variable
    raise ValueError("The initializer passed is not valid. It should "
ValueError: The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of `tf.keras.initializers.*' and `shape` should be fully defined.

System Info
Describe the characteristic of your environment:

Describe how the library was installed: pip in anaconda env
GPU models and configuration: default tf-gpu
Python version 3.7
Tensorflow version 1.13.2
Versions of any other relevant libraries

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

Miffyli · 2020-04-20T11:23:18Z

TRPO does not support recurrent architectures, as shown in the implementation grid.

Duplicate of #140.

Miffyli added the duplicate This issue or pull request already exists label Apr 20, 2020

Miffyli closed this as completed Apr 20, 2020

araffin added the RTFM Answer is the documentation label Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

GIS-PuppetMaster commented Apr 20, 2020

Miffyli commented Apr 20, 2020 •

edited

Loading

[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

Comments

GIS-PuppetMaster commented Apr 20, 2020

Miffyli commented Apr 20, 2020 • edited Loading

Miffyli commented Apr 20, 2020 •

edited

Loading