Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjust to msrun command #354

Merged
merged 1 commit into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 12 additions & 7 deletions GETTING_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,27 +48,32 @@ to understand their behavior. Some common arguments are:
```
</details>

* To train a model on 8 NPUs/GPUs:
```
mpirun --allow-run-as-root -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```

* To train a model on 1 NPU/GPU/CPU:
```
python train.py --config ./configs/yolov7/yolov7.yaml
```

* To train a model on 8 NPUs/GPUs:
```
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
* To evaluate a model's performance on 1 NPU/GPU/CPU:
```
python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt
```
* To evaluate a model's performance 8 NPUs/GPUs:
```
mpirun --allow-run-as-root -n 8 python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
WongGawa marked this conversation as resolved.
Show resolved Hide resolved
```
*Notes: (1) The default hyper-parameter is used for 8-card training, and some parameters need to be adjusted in the case of a single card. (2) The default device is Ascend, and you can modify it by specifying 'device_target' as Ascend/GPU/CPU, as these are currently supported.*
* For more options, see `train/test.py -h`.

* Notice that if you are using `msrun` startup with 2 devices, please add `--bind_core=True` to improve performance. For example:
```
msrun --bind_core=True --worker_num=2--local_worker_num=2 --master_port=8118 \
--log_dir=msrun_log --join=True --cluster_time_out=300 \
python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
> For more information, please refer to [here](https://www.mindspore.cn/tutorials/experts/en/r2.3.1/parallel/startup_method.html).

### Deployment

Expand Down
23 changes: 14 additions & 9 deletions GETTING_STARTED_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,15 @@ python demo/predict.py --config ./configs/yolov7/yolov7.yaml --weight=/path_to_c
```
</details>

* 在多卡NPU/GPU上进行分布式模型训练,以8卡为例:

```shell
mpirun --allow-run-as-root -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```

* 在单卡NPU/GPU/CPU上训练模型:

```shell
python train.py --config ./configs/yolov7/yolov7.yaml
```

* 在多卡NPU/GPU上进行分布式模型训练,以8卡为例:
```shell
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
* 在单卡NPU/GPU/CPU上评估模型的精度:

```shell
Expand All @@ -65,12 +62,20 @@ python demo/predict.py --config ./configs/yolov7/yolov7.yaml --weight=/path_to_c
* 在多卡NPU/GPU上进行分布式评估模型的精度:

```shell
mpirun --allow-run-as-root -n 8 python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python test.py --config ./configs/yolov7/yolov7.yaml --weight /path_to_ckpt/WEIGHT.ckpt --is_parallel True
```

*注意:默认超参为8卡训练,单卡情况需调整部分参数。 默认设备为Ascend,您可以指定'device_target'的值为Ascend/GPU/CPU。*
* 有关更多选项,请参阅 `train/test.py -h`.
* 在云脑上进行训练,请在[这里](./tutorials/cloud/modelarts_CN.md)查看
* 在云脑上进行训练,请在[这里](./tutorials/cloud/modelarts_CN.md)查看。

*注意:如果您在 2 个设备上使用`msrun`指令启动,请添加`--bind_core=True`以提高性能。例如:
```
msrun --bind_core=True --worker_num=2--local_worker_num=2 --master_port=8118 \
--log_dir=msrun_log --join=True --cluster_time_out=300 \
python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True
```
> 有关更多选项, 请参阅[这里](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/startup_method.html)。

### 部署

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ python mindyolo/utils/convert_weight_darknet53.py
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov3/yolov3.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov3_log python train.py --config ./configs/yolov3/yolov3.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html)

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov4/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,11 @@ python mindyolo/utils/convert_weight_cspdarknet53.py
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov4_log python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html)

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov5/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov5/yolov5n.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov5_log python train.py --config ./configs/yolov5/yolov5n.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html)

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov7/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html)

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
6 changes: 3 additions & 3 deletions configs/yolov8/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolov8/yolov8n.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov8_log python train.py --config ./configs/yolov8/yolov8n.yaml --device_target Ascend --is_parallel True
```
> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.

Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html)

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
8 changes: 3 additions & 5 deletions configs/yolox/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,11 @@ Please refer to the [GETTING_STARTED](https://github.com/mindspore-lab/mindyolo/
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
```shell
# distributed training on multiple GPU/Ascend devices
mpirun -n 8 python train.py --config ./configs/yolox/yolox-s.yaml --device_target Ascend --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolox_log python train.py --config ./configs/yolox/yolox-s.yaml --device_target Ascend --is_parallel True
```

> If the script is executed by the root user, the `--allow-run-as-root` parameter must be added to `mpirun`.


Similarly, you can train the model on multiple GPU devices with the above mpirun command.
Similarly, you can train the model on multiple GPU devices with the above msrun command.
**Note:** For more information about msrun configuration, please refer to [here](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.3.1/parallel/msrun_launcher.html)

For detailed illustration of all hyper-parameters, please refer to [config.py](https://github.com/mindspore-lab/mindyolo/blob/master/mindyolo/utils/config.py).

Expand Down
2 changes: 1 addition & 1 deletion examples/finetune_SHWD/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ optimizer:
* 在多卡NPU/GPU上进行分布式模型训练,以8卡为例:

```shell
mpirun --allow-run-as-root -n 8 python train.py --config ./examples/finetune_SHWD/yolov7-tiny_shwd.yaml --is_parallel True
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7-tiny_log python train.py --config ./examples/finetune_SHWD/yolov7-tiny_shwd.yaml --is_parallel True
```

* 在单卡NPU/GPU/CPU上训练模型:
Expand Down
2 changes: 1 addition & 1 deletion tutorials/configuration_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ __BASE__: [
该部分参数通常由命令行传入,示例如下:

```shell
mpirun --allow-run-as-root -n 8 python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True --log_interval 50
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --is_parallel True --log_interval 50
```

## 数据集
Expand Down
Loading