-
Notifications
You must be signed in to change notification settings - Fork 1
/
reverse.html
145 lines (123 loc) · 4.15 KB
/
reverse.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
<!doctype html>
<html>
<meta charset='utf-8'>
<title>reverse results</title>
<script src='https://cdn.plot.ly/plotly-latest.min.js'></script>
<script type='text/javascript' src='scripts/bench.js'></script>
</html>
<body>
<h1> reverse </h1>
So far only have an inplace version. (no reverse_copy) <br />
Unrolling by hand does not do anything useful. <br />
<h2> 40 bytes summary</h2>
I am not too sure if the slowest part is active in any of these cases but
<br />
Seems like for 10 integers we can't beat the scalar perfromace. <br />
I have no idea why the scalar is this fast though. <br />
For chars makes some sense - we get a two times speed up <br />
With code alignment - not as cool as other simd algorithms - <br />
there are a few close branches in the beginning. (especially for integers).
<br />
I can transform one of them in cmov but I don't care enough at the moment to
try. <br />
<h3> 40 bytes, data</h3>
<div id='40_bytes_data'></div>
<script>dynamicEntryPoint('40_bytes_data', {
name: 'inplace transform',
size: 40,
algorithm: 'selection',
type: 'x',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['reverse<256, 1>', 'eve::algo::reverse', 'std::reverse']);
</script>
<h3> 40 bytes, code alignment </h3>
<div id='40_bytes_data_code_alignment'></div>
<script>dynamicEntryPoint('40_bytes_data_code_alignment', {
name: 'inplace transform',
size: 40,
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'minmax',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['reverse<256, 1>', 'eve::algo::reverse', 'std::reverse']);
</script>
<h2> 1000 bytes summary</h2>
Nice results: 17.8 times faster for chars, 9.1 for shorts, 7.6 for ints.
<br />
Code alignment can decrease perf almost 1.75 times for integers - <br />
again blame those two branches. Not sure I care.
<h3> 1000 bytes, data</h3>
<div id='1000_bytes_data'></div>
<script>dynamicEntryPoint('1000_bytes_data', {
name: 'inplace transform',
size: 1000,
algorithm: 'selection',
type: 'x',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['reverse<256, 1>', 'eve::algo::reverse', 'std::reverse']);
</script>
<h3> 1000 bytes, code alignment </h3>
<div id='1000_bytes_data_code_alignment'></div>
<script>dynamicEntryPoint('1000_bytes_data_code_alignment', {
name: 'inplace transform',
size: 1000,
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'minmax',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['reverse<256, 1>', 'eve::algo::reverse', 'std::reverse']);
</script>
<h2> 10000 bytes summary</h2>
Nice results: 17.8 times faster for chars, 9 times for shorts, 6.5 for ints.
<br />
Code alignment impact is almost negligeable - about 10% for ints and nothing for others. <br />
<h3> 10000 bytes, data</h3>
<div id='10000_bytes_data'></div>
<script>dynamicEntryPoint('10000_bytes_data', {
name: 'inplace transform',
size: 10000,
algorithm: 'selection',
type: 'x',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['reverse<256, 1>', 'eve::algo::reverse', 'std::reverse']);
</script>
<h3> 10000 bytes, code alignment </h3>
<div id='10000_bytes_data_code_alignment'></div>
<script>dynamicEntryPoint('10000_bytes_data_code_alignment', {
name: 'inplace transform',
size: 10000,
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'minmax',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, [ 'reverse<256, 1>', 'eve::algo::reverse', 'std::reverse']);
</script>
<h2> Total benchmark </h2>
<div id='total_benchmark'></div>
<script>dynamicEntryPoint('total_benchmark', {
name: 'inplace transform',
size: 'x',
algorithm: 'selection',
type: 'selection',
time: 'y',
padding: 'min',
group: 'intel_9700K_avx2_bmi',
percentage: 100,
}, ['reverse']);
</script>
</body>