Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

Closed
GIS-PuppetMaster opened this issue Apr 20, 2020 · 1 comment
Closed

[question]TRPO can't use LstmPolicy or MlpLstmPolicy? #816

GIS-PuppetMaster opened this issue Apr 20, 2020 · 1 comment
Labels
duplicate This issue or pull request already exists RTFM Answer is the documentation

Comments

@GIS-PuppetMaster
Copy link

from stable_baselines.common.policies import *
from stable_baselines import *
from stable_baselines.common.env_checker import check_env
from stable_baselines.common.vec_env.dummy_vec_env import DummyVecEnv
from TradeEnv import TradeEnv
from Util.Util import *
from Util.Callback import CustomCallback

episode = 5000
EP_LEN = 250 * 3
FILE_TAG = "TRPO"
mode = "train"
n_training_envs = 1


def post_processor(state):
    return log10plus1R(state) / 10


def make_env():
    env = TradeEnv(obs_time_size='60 day', obs_delta_frequency='1 day', sim_delta_time='1 day',
                   start_episode=0, episode_len=EP_LEN, stock_code='000938_XSHE',
                   result_path="E:/运行结果/TRPO/" + FILE_TAG + "/" + mode + "/",
                   stock_data_path='E:\PycharmProjects\DPPO\myDPPO\Data/train/',
                   poundage_rate=1.5e-3, reward_verbose=1, post_processor=post_processor,
                   max_episode_days=EP_LEN)
    env.seed(0)
    check_env(env)
    return env


env = DummyVecEnv([make_env for _ in range(n_training_envs)])
callback = CustomCallback()
model = TRPO(MlpLstmPolicy, env, verbose=1, tensorboard_log="./log/", seed=0)             
model.learn(total_timesteps=episode * EP_LEN, callback=callback)
model.save("./model")

Describe the bug
the observation of my ENV is
self.observation_space = spaces.Box(
low=np.array([[float('-inf') for _ in range(26)]
for _ in range(self.obs_time // self.obs_delta_frequency)]),
high=np.array([[float('inf') for _ in range(26)]
for _ in range(0, self.obs_time // self.obs_delta_frequency)]))
its shape is like (timestep=60, feature=26)

When I run this code, I got error like this:
ValueError: The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of tf.keras.initializers.*' and shape` should be fully defined.
But if I use PPO2, it runs normal

Code example
see above

Traceback (most recent call last): 
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\trpo_mpi\trpo_mpi.py", line 138, in setup_model
    None, reuse=False, **self.policy_kwargs)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\common\policies.py", line 681, in __init__
    layer_norm=False, feature_extraction="mlp", **_kwargs)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\common\policies.py", line 427, in __init__
    layer_norm=layer_norm)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\stable_baselines\common\tf_layers.py", line 143, in lstm
    weight_x = tf.get_variable("wx", [n_input, n_hidden * 4], initializer=ortho_init(init_scale))
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1479, in get_variable
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1220, in get_variable
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 547, in get_variable
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 499, in _true_getter
    aggregation=aggregation)
  File "C:\Users\zkx74\Anaconda3\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 890, in _get_single_variable
    raise ValueError("The initializer passed is not valid. It should "
ValueError: The initializer passed is not valid. It should be a callable with no arguments and the shape should not be provided or an instance of `tf.keras.initializers.*' and `shape` should be fully defined.

System Info
Describe the characteristic of your environment:

  • Describe how the library was installed: pip in anaconda env
  • GPU models and configuration: default tf-gpu
  • Python version 3.7
  • Tensorflow version 1.13.2
  • Versions of any other relevant libraries

Additional context
Add any other context about the problem here.

@Miffyli
Copy link
Collaborator

Miffyli commented Apr 20, 2020

TRPO does not support recurrent architectures, as shown in the implementation grid.

Duplicate of #140.

@Miffyli Miffyli added the duplicate This issue or pull request already exists label Apr 20, 2020
@Miffyli Miffyli closed this as completed Apr 20, 2020
@araffin araffin added the RTFM Answer is the documentation label Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists RTFM Answer is the documentation
Projects
None yet
Development

No branches or pull requests

3 participants