Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config Setting on Bi-DexHands domain #28

Open
JensenLZX opened this issue Feb 5, 2024 · 2 comments
Open

Config Setting on Bi-DexHands domain #28

JensenLZX opened this issue Feb 5, 2024 · 2 comments

Comments

@JensenLZX
Copy link

Problem: Reproduced result is lower than the one in paper a lot

Details:
I want to reproduce the results in Bi-DexHands domain. I use the scripts which you provide directly.

#!/bin/sh
env="hands"
task="ShadowHandCatchOver2Underarm"
#ShadowHandDoorCloseOutward
#ShadowHandDoorOpenInward
#ShadowHandCatchOver2Underarm
algo="mat"
exp="single"
seed=1

echo "env is ${env}, task is ${task}, algo is ${algo}, exp is ${exp}, seed is ${seed}"
CUDA_VISIBLE_DEVICES=0 python train/train_hands.py --env_name ${env} --seed ${seed} --algorithm_name ${algo} --experiment_name ${exp} --task ${task} --n_rollout_threads 80 --lr 5e-5 --entropy_coef 0.001 --max_grad_norm 0.5 --eval_episodes 5 --log_interval 25 --n_training_threads 16 --num_mini_batch 1 --num_env_steps 50000000 --gamma 0.96 --ppo_epoch 5 --clip_param 0.2 --use_value_active_masks --add_center_xy --use_state_agent --use_policy_active_masks

However, it shows there are some bugs :

usage: train_hands.py [-h] [--sim_device SIM_DEVICE] [--pipeline PIPELINE]
                      [--graphics_device_id GRAPHICS_DEVICE_ID]
                      [--flex | --physx] [--num_threads NUM_THREADS]
                      [--subscenes SUBSCENES] [--slices SLICES]
                      [--env_name ENV_NAME] [--algorithm_name ALGORITHM_NAME]
                      [--experiment_name EXPERIMENT_NAME] [--n_block N_BLOCK]
                      [--n_embd N_EMBD] [--lr LR]
                      [--value_loss_coef VALUE_LOSS_COEF]
                      [--entropy_coef ENTROPY_COEF]
                      [--max_grad_norm MAX_GRAD_NORM]
                      [--eval_episodes EVAL_EPISODES]
                      [--n_training_threads N_TRAINING_THREADS]
                      [--n_rollout_threads N_ROLLOUT_THREADS]
                      [--num_mini_batch NUM_MINI_BATCH]
                      [--num_env_steps NUM_ENV_STEPS] [--ppo_epoch PPO_EPOCH]
                      [--log_interval LOG_INTERVAL] [--clip_param CLIP_PARAM]
                      [--use_value_active_masks] [--use_eval]
                      [--add_center_xy] [--use_state_agent]
                      [--use_policy_active_masks] [--dec_actor]
                      [--share_actor] [--test] [--play] [--resume RESUME]
                      [--checkpoint CHECKPOINT] [--headless] [--horovod]
                      [--task TASK] [--task_type TASK_TYPE]
                      [--rl_device RL_DEVICE] [--logdir LOGDIR]
                      [--experiment EXPERIMENT] [--metadata]
                      [--cfg_train CFG_TRAIN] [--cfg_env CFG_ENV]
                      [--num_envs NUM_ENVS] [--episode_length EPISODE_LENGTH]
                      [--seed SEED] [--max_iterations MAX_ITERATIONS]
                      [--steps_num STEPS_NUM]
                      [--minibatch_size MINIBATCH_SIZE] [--randomize]
                      [--torch_deterministic] [--algo ALGO]
                      [--model_dir MODEL_DIR]
train_hands.py: error: unrecognized arguments: --gamma 0.96

So, I delete the --gamma parameter and directly modify the file config.py. Set the default into 0.96:

    parser.add_argument("--gamma", type=float, default=0.96,
                        help='discount factor for rewards (default: 0.99)')

However, I get the result:

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8250/8333 episodes, total num timesteps 49506000/50000000, FPS 1021.

average_step_rewards is 0.330600768327713.
some episodes done, average rewards:  19.574572331772863

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8275/8333 episodes, total num timesteps 49656000/50000000, FPS 1022.
                                                                                                                                                average_step_rewards is 0.3444286584854126.
some episodes done, average rewards:  20.018084016291084

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8300/8333 episodes, total num timesteps 49806000/50000000, FPS 1023.                                                                                                                                                         average_step_rewards is 0.3596132695674896.
some episodes done, average rewards:  20.760233263901018

 Task ShadowHandCatchOver2Underarm Algo mat_dec Exp single updates 8325/8333 episodes, total num timesteps 49956000/50000000, FPS 1024.         
average_step_rewards is 0.3465554118156433.
some episodes done, average rewards:  20.917307748507582

It is far away from your results (about 25) in the paper. I guess there might be some config set wrongly. Can I get a latest script or any instructions about what I might do wrong?

@morning9393
Copy link
Collaborator

morning9393 commented Feb 5, 2024

hiya,thank you so much for your attention, I noticed that your error message contains a lot of hyper parameters that are not in this repo, e.g. num_envs, cfg_train, steps_num... It seems that your config conflicts with other things in your local Python environment/workspace.

Thus, I recommend first to find out the cause of this strange error before modifying the config file directly~~ hoping it might help you~~

@JensenLZX
Copy link
Author

JensenLZX commented Feb 5, 2024

Those hyper parameters are not introduced into this code by me as I just use the original code. It seems that the Bi-Dexhands benchmark introduces these config. I haven't modified this source code. I just clone the current version into local and run the script: ./mat/scripts/train_hands.sh.

I have checked this just now. The original code can reproduce the bug. And it still gets the same error information.

Those hyperparam may be these:
https://github.com/PKU-MARL/DexterousHands/blob/99c1e2a399fb084df5c02dbb5f6182d394fcd2e8/bidexhands/utils/config.py#L244
Thanks for your help in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants