[feature request] LSTM policies with custom feature extractors #160

cisprague · 2020-09-18T00:17:48Z

Hi! It would be awesome to be able to implement LSTM policies in this library, like in the former version. Is there an straightforward way to accomplish this with the current version?

araffin · 2020-09-18T09:17:27Z

Hello,
As mentioned in the roadmap #1 and in the doc, recurrent policies are not yet supported but planned for v1.1+.

JadenTravnik · 2020-12-10T18:25:19Z

How would recurrent policies be approached here? Any recommendations?

Miffyli · 2020-12-10T18:41:12Z

@JadenTravnik

I think the implementation style of SB2 would be a good starting point, and there is already some support for that in SB3 (e.g. predict returns hidden_state despite there no being any hidden states yet). However I short review and discussion over it and its weaknesses would be a good idea, for the quality of the implementations and clarity of the code.

Edit: One thing we might want to consider is updating buffers to always store trajectory-information and/or even keep different trajectories separate to make this easier.

AntoineMath · 2020-12-26T18:28:24Z

@Miffyli
Do you mean :

We should implement ActorCriticLstmPolicy class in the same way as ActorCriticCnnPolicy is implemented in SB3 (with a LstmExtractor class in torch_layers.py), and following the lstm architecture of SB2.

or

We should re-implement ActorCriticPolicy class and all its different sublasses in the same way as in SB2 (e.g ReccurentActorCriticPolicy -> LstmPolicy -> MlpLstmPolicy classes). So the Lstm network is directly defined in the LstmPolicy class.

?

Miffyli · 2021-01-04T13:23:49Z

I do not dare to give any guides just yet. PyTorch makes things much easier when it comes to RNN-type of networks and we are able to save ourselves from a ton of headache compared to TF if we do things right. On a quick ponder I think we might be able to implement RNN support in a very low level in policies (e.g. arguments to BasePolicy or ActorCriticPolicy). One option would be to assume that all observations have sequence-dimension on top of batch-dimension. This would make RNNs easier to add (just add nn.LSTM layer and things work out) and also avoid if-elseing, at the cost of remembering that we always have the time dimension.

My point being: First step for RNN support would be to think of clean and functionally satisfactory ("RNN agents that work") solutions before starting to think of code level things.

jcarlosroldan · 2021-05-12T19:33:32Z

Shouldn't this be part of the breaking changes on the migration guide? At least in the meantime

Miffyli · 2021-05-12T19:44:57Z

@juancroldan do you mean something like this: "LstmPolicy is not supported for the time being"? Just to make sure an user does not go looking for this. If so, a PR to add this would be welcome :)

glebarez · 2021-05-29T18:04:54Z

How does recurrent policy differ from actually implementing custom policy with LSTM/GRU network and just prepare your state in a way that it will already contain previous experience/timeline?
Would you please list benefits here? Thank you

Miffyli · 2021-05-30T20:39:12Z

@glebarez Storing hidden states during training, and the whole backprop through time, is the largest headache here. We need to update the code to store the hidden states between rollouts, load them up correctly and then do the training. Some of this infrastructure is there, so it is not that big of a deal (you can get plenty of hints from SB2), but only implementing it on the policy level is not enough (training needs to be updated as well).

glebarez · 2021-05-30T20:52:50Z

@Miffyli thank you for taking time in making explanation.
Indeed there seem to be much inner workings that are well suitable to be incapsulated in the policy.
I glanced through the SB2 code and find it somewhat overcomplicated.
Also SB2 documentation lacks comprehensive examples for recurrent policies.
I'm looking forward for SB3 release that would include recurrent policies on torch.
Again, many thanks.

Miffyli · 2021-05-30T20:55:02Z

Yup the plan is to have much simpler/clearer RNN implementation this time around, using the things we learned during SB2 :). PS: If you feel like working towards anything like this, PRs are always welcome!

Recurrent policies are not supported yet as of (DLR-RM#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading.

* Remove recurrent policies from A2C docs Recurrent policies are not supported yet as of (#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading. * Update changelog Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>

micronoz · 2021-07-24T04:20:25Z

any updates on recurrent policies?

Miffyli · 2021-07-24T11:14:18Z

No active development as of yet. You could try using original stable-baselines algorithms if those are suitable for you.

* Remove recurrent policies from A2C docs Recurrent policies are not supported yet as of (DLR-RM#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading. * Update changelog Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>

araffin · 2021-11-25T16:20:07Z

I have a very experimental version of recurrent PPO in a SB3 contrib branch based on SB2/cleanRL implementation: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/compare/feat/ppo-lstm

Use it at your own risk :p

closing in favor of #18

* Remove recurrent policies from A2C docs Recurrent policies are not supported yet as of (DLR-RM/stable-baselines3#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading. * Update changelog Co-authored-by: benjaminjsteenhoek@gmail.com <benjis@iastate.edu>

araffin added the enhancement New feature or request label Sep 18, 2020

Miffyli mentioned this issue Nov 1, 2020

reccurent policy #210

Closed

Miffyli added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Dec 29, 2020

Miffyli mentioned this issue Jan 4, 2021

[Question] How to implement RNN? #279

Closed

Miffyli removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Jan 4, 2021

araffin mentioned this issue May 10, 2021

recurrent policy implementation in ppo [feature-request] #18

Closed

jcarlosroldan mentioned this issue May 17, 2021

Added policy changes #438

Closed

13 tasks

araffin mentioned this issue May 23, 2021

Documentation update #450

Merged

14 tasks

bstee615 added a commit to bstee615/stable-baselines3 that referenced this issue Jun 7, 2021

Remove recurrent policies from A2C docs

ccc1721

Recurrent policies are not supported yet as of (DLR-RM#160 (comment)), but the docs say that A2C supports them. Changing it to avoid misleading.

This was referenced Jun 7, 2021

Remove recurrent policies from A2C docs #470

Merged

A2C documentation lists recurrent policies as enabled #471

Closed

Miffyli mentioned this issue Aug 22, 2021

Recurrent layer support for Online policy algorithms #550

Closed

Miffyli mentioned this issue Aug 30, 2021

[Question] LSTM and GRU #554

Closed

araffin mentioned this issue Oct 1, 2021

[Feature Request] Recurrent Policies for A2C #592

Closed

araffin added the help wanted Help from contributors is welcomed label Nov 4, 2021

araffin closed this as completed Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] LSTM policies with custom feature extractors #160

[feature request] LSTM policies with custom feature extractors #160

cisprague commented Sep 18, 2020 •

edited

Loading

araffin commented Sep 18, 2020

JadenTravnik commented Dec 10, 2020

Miffyli commented Dec 10, 2020 •

edited

Loading

AntoineMath commented Dec 26, 2020

Miffyli commented Jan 4, 2021

jcarlosroldan commented May 12, 2021

Miffyli commented May 12, 2021

glebarez commented May 29, 2021 •

edited

Loading

Miffyli commented May 30, 2021

glebarez commented May 30, 2021

Miffyli commented May 30, 2021

micronoz commented Jul 24, 2021

Miffyli commented Jul 24, 2021

araffin commented Nov 25, 2021

[feature request] LSTM policies with custom feature extractors #160

[feature request] LSTM policies with custom feature extractors #160

Comments

cisprague commented Sep 18, 2020 • edited Loading

araffin commented Sep 18, 2020

JadenTravnik commented Dec 10, 2020

Miffyli commented Dec 10, 2020 • edited Loading

AntoineMath commented Dec 26, 2020

Miffyli commented Jan 4, 2021

jcarlosroldan commented May 12, 2021

Miffyli commented May 12, 2021

glebarez commented May 29, 2021 • edited Loading

Miffyli commented May 30, 2021

glebarez commented May 30, 2021

Miffyli commented May 30, 2021

micronoz commented Jul 24, 2021

Miffyli commented Jul 24, 2021

araffin commented Nov 25, 2021

cisprague commented Sep 18, 2020 •

edited

Loading

Miffyli commented Dec 10, 2020 •

edited

Loading

glebarez commented May 29, 2021 •

edited

Loading