-
Notifications
You must be signed in to change notification settings - Fork 4
/
hvd_train.txt
80 lines (80 loc) · 8.28 KB
/
hvd_train.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
Data strategy: [{'epoch': 21, 'batch_size': 740, 'image_size': (128, 128), 'data_dir': '160', 'prefix': 'train'}, {'epoch': 11, 'batch_size': 256, 'image_size': (224, 224), 'data_dir': '320', 'prefix': 'train'}, {'epoch': 4, 'batch_size': 196, 'image_size': (224, 224), 'data_dir': '320', 'prefix': 'train'}, {'epoch': 4, 'batch_size': 128, 'image_size': (288, 288), 'data_dir': '320', 'prefix': 'train'}]
Learning rate strategy:[{'epoch': [0, 6], 'lr_method': 'linear', 'lr': [1.0, 2.0], 'steps': [0, 1296]}, {'epoch': [6, 21], 'lr_method': 'linear', 'lr': [2.0, 0.45], 'steps': [1296, 4536]}, {'epoch': [21, 32], 'lr_method': 'exp', 'lr': [0.45, 0.02], 'steps': [4536, 11411]}, {'epoch': [32, 36], 'lr_method': 'exp', 'lr': [0.02, 0.004], 'steps': [11411, 14679]}, {'epoch': [36, 40], 'lr_method': 'exp', 'lr': [0.004, 0.002], 'steps': [14679, 19683]}]
Total Max Training Steps: 20308
Checkpointing every 625 steps
Saving summary every 625 steps
PY3.6.7 (default, Oct 21 2018, 04:56:05)
[GCC 5.4.0 20160609]TF1.11.0
Horovod size: Log: 8
Using preprocessing threads per GPU: Log: 6
Log: Step Epoch Speed Loss FinLoss LR bs imsize
Log @train: steps@0 epoch@0.0 im/s@352.24 loss@6.933 total_loss@9.610 lr@1.00000 bs@740 sz@128 top5@0.0014 tm@16.81
Log @train: steps@1 epoch@0.0 im/s@1503.90 loss@6.907 total_loss@9.581 lr@1.00077 bs@740 sz@128 top5@0.0068 tm@20.74
Log @train: steps@625 epoch@2.9 im/s@14262.11 loss@3.702 total_loss@4.448 lr@1.48225 bs@740 sz@128 top5@0.4878 tm@279.86
Log @train: steps@1250 epoch@5.8 im/s@14229.95 loss@3.089 total_loss@4.131 lr@1.96451 bs@740 sz@128 top5@0.5973 tm@539.98
Log @train: steps@1875 epoch@8.7 im/s@14021.50 loss@3.015 total_loss@4.125 lr@1.72301 bs@740 sz@128 top5@0.6108 tm@803.96
Log @train: steps@2500 epoch@11.6 im/s@14242.36 loss@2.680 total_loss@3.771 lr@1.42401 bs@740 sz@128 top5@0.6797 tm@1063.86
Log @train: steps@3125 epoch@14.4 im/s@14212.74 loss@2.547 total_loss@3.588 lr@1.12502 bs@740 sz@128 top5@0.6986 tm@1324.29
Log @train: steps@3750 epoch@17.3 im/s@14183.16 loss@2.266 total_loss@3.227 lr@0.82602 bs@740 sz@128 top5@0.7351 tm@1585.27
Log @train: steps@4375 epoch@20.2 im/s@14159.80 loss@1.944 total_loss@2.794 lr@0.52702 bs@740 sz@128 top5@0.7865 tm@1846.67
Log @train: steps@5000 epoch@21.7 im/s@4446.64 loss@2.061 total_loss@3.008 lr@0.36471 bs@256 sz@224 top5@0.7773 tm@2134.63
Log @train: steps@5625 epoch@22.7 im/s@4800.80 loss@2.208 total_loss@3.105 lr@0.27481 bs@256 sz@224 top5@0.7773 tm@2401.36
Log @train: steps@6250 epoch@23.7 im/s@4808.55 loss@1.753 total_loss@2.583 lr@0.20706 bs@256 sz@224 top5@0.8008 tm@2667.66
Log @train: steps@6875 epoch@24.7 im/s@4816.36 loss@1.668 total_loss@2.436 lr@0.15602 bs@256 sz@224 top5@0.8359 tm@2933.52
Log @train: steps@7500 epoch@25.7 im/s@4757.16 loss@1.587 total_loss@2.301 lr@0.11756 bs@256 sz@224 top5@0.8438 tm@3202.69
Log @train: steps@8125 epoch@26.7 im/s@4817.18 loss@1.484 total_loss@2.152 lr@0.08858 bs@256 sz@224 top5@0.8516 tm@3468.52
Log @train: steps@8750 epoch@27.7 im/s@4830.77 loss@1.345 total_loss@1.973 lr@0.06674 bs@256 sz@224 top5@0.8711 tm@3733.59
Log @train: steps@9375 epoch@28.7 im/s@4791.56 loss@1.537 total_loss@2.133 lr@0.05029 bs@256 sz@224 top5@0.8711 tm@4000.83
Log @train: steps@10000 epoch@29.7 im/s@4793.97 loss@1.396 total_loss@1.966 lr@0.03789 bs@256 sz@224 top5@0.8672 tm@4267.94
Log @train: steps@10625 epoch@30.7 im/s@4805.89 loss@1.272 total_loss@1.820 lr@0.02855 bs@256 sz@224 top5@0.8984 tm@4534.38
Log @train: steps@11250 epoch@31.7 im/s@4782.20 loss@1.226 total_loss@1.757 lr@0.02151 bs@256 sz@224 top5@0.8750 tm@4802.15
Log @train: steps@11875 epoch@32.5 im/s@3777.96 loss@1.106 total_loss@1.625 lr@0.01591 bs@196 sz@224 top5@0.9082 tm@5061.66
Log @train: steps@12500 epoch@33.3 im/s@4153.06 loss@1.023 total_loss@1.532 lr@0.01170 bs@196 sz@224 top5@0.9235 tm@5297.73
Log @train: steps@13125 epoch@34.0 im/s@4100.30 loss@0.954 total_loss@1.456 lr@0.00860 bs@196 sz@224 top5@0.9184 tm@5536.84
Log @train: steps@13750 epoch@34.8 im/s@4117.88 loss@1.290 total_loss@1.786 lr@0.00632 bs@196 sz@224 top5@0.8571 tm@5774.93
Log @train: steps@14375 epoch@35.6 im/s@4141.67 loss@0.949 total_loss@1.440 lr@0.00465 bs@196 sz@224 top5@0.9286 tm@6011.66
Log @train: steps@15000 epoch@36.2 im/s@2446.78 loss@0.697 total_loss@1.185 lr@0.00383 bs@128 sz@288 top5@0.9844 tm@6273.32
Log @train: steps@15625 epoch@36.7 im/s@2649.55 loss@0.457 total_loss@0.942 lr@0.00351 bs@128 sz@288 top5@0.9844 tm@6514.98
Log @train: steps@16250 epoch@37.2 im/s@2649.96 loss@0.507 total_loss@0.990 lr@0.00322 bs@128 sz@288 top5@0.9844 tm@6756.60
Log @train: steps@16875 epoch@37.7 im/s@2661.96 loss@0.550 total_loss@1.031 lr@0.00295 bs@128 sz@288 top5@0.9766 tm@6997.12
Log @train: steps@17500 epoch@38.2 im/s@2667.36 loss@0.561 total_loss@1.039 lr@0.00271 bs@128 sz@288 top5@0.9609 tm@7237.17
Log @train: steps@18125 epoch@38.7 im/s@2641.22 loss@0.560 total_loss@1.036 lr@0.00248 bs@128 sz@288 top5@0.9922 tm@7479.58
Log @train: steps@18750 epoch@39.2 im/s@2641.28 loss@0.637 total_loss@1.111 lr@0.00228 bs@128 sz@288 top5@0.9609 tm@7721.99
Log @train: steps@19375 epoch@39.7 im/s@2636.88 loss@0.510 total_loss@0.983 lr@0.00209 bs@128 sz@288 top5@0.9766 tm@7964.81
Log: Finished in Log: 8109.004532575607
Log: Evaluating
Log: Validation dataset size: 50000
step top1 top5 loss checkpoint_time(UTC)
Log @eval: count@ 4 step@ 2500 top1@32.522 top5@ 58.76 loss@ 3.51 time@2019-01-07 23:40:33
Log @eval: count@ 7 step@ 4375 top1@50.593 top5@ 76.74 loss@ 2.64 time@2019-01-07 23:53:36
Log @eval: count@ 5 step@ 3125 top1@17.264 top5@ 35.91 loss@ 4.54 time@2019-01-07 23:44:54
Log @eval: count@ 1 step@ 625 top1@15.146 top5@ 33.96 loss@ 4.59 time@2019-01-07 23:27:29
Log @eval: count@ 0 step@ 0 top1@0.074 top5@ 0.49 loss@ 6.91 time@2019-01-07 23:22:43
Log @eval: count@ 13 step@ 8125 top1@67.907 top5@ 88.42 loss@ 1.36 time@2019-01-08 00:20:38
Log @eval: count@ 9 step@ 5625 top1@59.477 top5@ 82.96 loss@ 1.75 time@2019-01-08 00:02:51
Log @eval: count@ 15 step@ 9375 top1@70.597 top5@ 90.31 loss@ 1.21 time@2019-01-08 00:29:30
Log @eval: count@ 12 step@ 7500 top1@65.208 top5@ 86.86 loss@ 1.46 time@2019-01-08 00:16:12
Log @eval: count@ 8 step@ 5000 top1@52.276 top5@ 77.62 loss@ 2.16 time@2019-01-07 23:58:24
Log @eval: count@ 21 step@13125 top1@73.788 top5@ 92.02 loss@ 1.07 time@2019-01-08 00:55:06
Log @eval: count@ 23 step@14375 top1@74.972 top5@ 92.70 loss@ 1.00 time@2019-01-08 01:03:01
Log @eval: count@ 17 step@10625 top1@72.829 top5@ 91.49 loss@ 1.10 time@2019-01-08 00:38:24
Log @eval: count@ 20 step@12500 top1@73.898 top5@ 91.98 loss@ 1.07 time@2019-01-08 00:51:07
Log @eval: count@ 16 step@10000 top1@71.512 top5@ 90.67 loss@ 1.17 time@2019-01-08 00:33:57
Log @eval: count@ 29 step@18125 top1@75.741 top5@ 93.18 loss@ 0.94 time@2019-01-08 01:27:29
Log @eval: count@ 28 step@17500 top1@75.819 top5@ 93.19 loss@ 0.94 time@2019-01-08 01:23:27
Log @eval: count@ 31 step@19375 top1@75.819 top5@ 93.24 loss@ 0.93 time@2019-01-08 01:35:34
Log @eval: count@ 25 step@15625 top1@75.683 top5@ 93.00 loss@ 0.95 time@2019-01-08 01:11:24
Log @eval: count@ 24 step@15000 top1@75.559 top5@ 93.11 loss@ 0.95 time@2019-01-08 01:07:23
Log @eval: count@ 3 step@ 1875 top1@21.645 top5@ 44.70 loss@ 4.13 time@2019-01-07 23:36:13
Log @eval: count@ 2 step@ 1250 top1@26.689 top5@ 51.41 loss@ 3.89 time@2019-01-07 23:31:49
Log @eval: count@ 6 step@ 3750 top1@35.877 top5@ 61.46 loss@ 3.38 time@2019-01-07 23:49:15
Log @eval: count@ 32 step@19681 top1@75.795 top5@ 93.11 loss@ 0.94 time@2019-01-08 01:37:33
Log @eval: count@ 10 step@ 6250 top1@62.766 top5@ 85.59 loss@ 1.57 time@2019-01-08 00:07:17
Log @eval: count@ 11 step@ 6875 top1@63.053 top5@ 85.84 loss@ 1.58 time@2019-01-08 00:11:43
Log @eval: count@ 14 step@ 8750 top1@68.403 top5@ 89.09 loss@ 1.32 time@2019-01-08 00:25:03
Log @eval: count@ 18 step@11250 top1@72.494 top5@ 91.22 loss@ 1.13 time@2019-01-08 00:42:52
Log @eval: count@ 22 step@13750 top1@74.677 top5@ 92.63 loss@ 1.01 time@2019-01-08 00:59:04
Log @eval: count@ 19 step@11875 top1@73.263 top5@ 91.82 loss@ 1.09 time@2019-01-08 00:47:11
Log @eval: count@ 26 step@16250 top1@75.683 top5@ 93.02 loss@ 0.94 time@2019-01-08 01:15:26
Log @eval: count@ 30 step@18750 top1@75.883 top5@ 93.16 loss@ 0.94 time@2019-01-08 01:31:31
Log @eval: count@ 27 step@16875 top1@75.739 top5@ 93.10 loss@ 0.94 time@2019-01-08 01:19:26