How to train and test ilql model? #48

timercrack · 2023-08-08T00:16:10Z

timercrack
Aug 8, 2023

I can train the bert model use bert.phase1, and test it without problem.
I don't know much about the ilql model. It looks like an advanced version of lql. What is the purpuse of extract_policy.py in ilql directory and how to use it?
What step to do to train and test the iql model? I have prepared iql training data and reward plugin but don't know how to start.

Answered by Cryolite

Aug 12, 2023

Please use train.py for training. extract_policy.py implements the AWR algorithm, which is used in the original IQL paper. However, extract_policy.py is for extracting a policy model from a trained Q-function model. If you simply want to make predictions, you can use the Q-function model directly, and I think there is no need to run extract_policy.py.

View full answer

Cryolite · 2023-08-12T12:07:19Z

Cryolite
Aug 12, 2023
Maintainer

Please use train.py for training. extract_policy.py implements the AWR algorithm, which is used in the original IQL paper. However, extract_policy.py is for extracting a policy model from a trained Q-function model. If you simply want to make predictions, you can use the Q-function model directly, and I think there is no need to run extract_policy.py.

1 reply

timercrack Aug 13, 2023
Author

Thank you for your answser, I have another question, I saw your get_reward implementation is based on game_rank and game_score, and reward is set to 0.0 when it is called not at the end of a game because it does not have such value.
In this way, will other data be useless except for the data at the end of the game?
Can I use the round rank and round delta score instead when game is not end?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train and test ilql model? #48

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to train and test ilql model? #48

timercrack Aug 8, 2023

Replies: 1 comment · 1 reply

Cryolite Aug 12, 2023 Maintainer

timercrack Aug 13, 2023 Author

timercrack
Aug 8, 2023

Replies: 1 comment 1 reply

Cryolite
Aug 12, 2023
Maintainer

timercrack Aug 13, 2023
Author