Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: change mpirun with msrun and add other notices (merge into main) #805

Merged
merged 1 commit into from
Sep 12, 2024

Conversation

ChongWei905
Copy link
Contributor

Thank you for your contribution to the MindCV repo.
Before submitting this PR, please make sure:

Motivation

(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

@ChongWei905 ChongWei905 changed the title docs: change mpirun with msrun and add other notices docs: change mpirun with msrun and add other notices (merge into main) Sep 5, 2024
README.md Outdated

```shell
# distributed training
# assume you have 4 GPUs/NPUs
mpirun -n 4 python train.py --distribute \
msrun --worker_num 4 python train.py --distribute \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msrun也加下绑核吧

README.md Outdated
--model=densenet121 --dataset=imagenet --data_dir=/path/to/imagenet
```
> Notes: If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.
Notice that if you are using mpirun startup with 2 devices, please add `--bind-to numa` to avoid known performance error. For example:
Copy link
Collaborator

@Ash-Lee233 Ash-Lee233 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using mpirun xxxx --bind_core=True to improve performance.

README.md Outdated
Notice that if you are using mpirun startup with 2 devices, please add `--bind-to numa` to avoid known performance error. For example:

```shell
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msrun --bind_core=True --worker_num=2--local_worker_num=2 --master_port=8118
--log_dir=msrun_log --join=True --cluster_time_out=300 \

README_CN.md Outdated
--model densenet121 --dataset imagenet --data_dir ./datasets/imagenet
```
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下:
Copy link
Collaborator

@Ash-Lee233 Ash-Lee233 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果在两卡环境下选用msrun作为启动方式--bind_core=True增加绑核操作以优化两卡性能

README_CN.md Outdated
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下:

```shell
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上


```shell
# standalone training on a gpu or ascend device
python train.py --config configs/densenet/densenet_121_gpu.yaml --data_dir /path/to/dataset --distribute False

# distributed training on gpu or ascend divices
mpirun -n 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
msrun --worker_num 8 python train.py --config configs/densenet/densenet_121_ascend.yaml --data_dir /path/to/imagenet
Copy link
Collaborator

@Ash-Lee233 Ash-Lee233 Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./device

docs/zh/index.md Outdated
--model densenet121 --dataset imagenet --data_dir ./datasets/imagenet
```

注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

docs/zh/index.md Outdated
注意,如果在两卡环境下选用mpirun作为启动方式,请添加配置项 `--bind-to numa` 增加绑核操作以规避两卡场景下的性能问题,范例代码如下:

```shell
mpirun --allow-run-as-root --merge-stderr-to-stdout --output-filename ./output_bind --bind-to numa -n 2 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

README.md Outdated
--model=densenet121 --dataset=imagenet --data_dir=/path/to/imagenet
```
> Notes: If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Notice that if you are using msrun startup with 2 devices, please add `--bind_core=True` to avoid known performance error. For example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to improve performance.
下同

@Ash-Lee233 Ash-Lee233 merged commit 009aaeb into mindspore-lab:main Sep 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants