-
Notifications
You must be signed in to change notification settings - Fork 40
/
results
109 lines (90 loc) · 2.82 KB
/
results
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
setup
=====
single thread
100000 points
10 clusters
15 iterations
average over 100 repetitions
machine info
============
All tests have been run on my Sony Vaio Z laptop with the following spec:
cpu:
* Processor name : Intel(R) Core(TM) i7-2620M
* Packages(sockets) : 1
* Cores : 2
* Processors(CPUs) : 4
* Cores per package : 2
* Threads per core : 2
memory: 8 GB
OS: Ubuntu 14.10
results
=======
nim (0.13.1) 148 ms
crystal (0.21.1) 167ms
java (1.7.0_76) 183 ms
kotlin (1.1-M01) 183 ms
c++ (g++ 4.9.2) 187 ms
rust (1.13.0) 196 ms
common lisp (sbcl 1.1.14) 204 ms
c (gcc 4.8.2) 218 ms
pony (0.2.1-531-g2cf6b89) 228 ms
julia (0.4.0-dev+3052) 266 ms
scala (2.11.7) 435 ms
pypy (2.5.0) 563 ms
java8 (1.8.0_31) 565 ms
luajit (2.0.2) 611 ms
ocaml (4.02.1) 796 ms
go (1.6.2) 971 ms
x10-c++ (2.5.2) 1436 ms
haskell (ghc 7.8.4) 1663 ms *
pharo (5) 2402 ms
x10-java (2.5.2) 2720 ms
factor (0.97) 2895 ms
clojure (1.7.0) 3511 ms
node (5.3.0) 3871 ms
elixir (1.0.3) 3949 ms
stanza (0.9.6) 4319 ms
erlang (R17) 4536 ms
scala-js (0.6.13 on Node 5.3.0) 4945 ms
io.js (1.4.3) 5241 ms
d (2.066.1) 5403 ms
lua (5.2.3) 6946 ms
ocaml bytecode (4.02.1) 8021 ms
python (2.7.6) 10632 ms
swift (2.2-dev) 11391 ms
perl (5.20.2) 15680 ms
rubinius (2.5.2) 20878 ms
ruby (1.9.3p484) 24819 ms
* I am not able to run this 100 times without the runtime caching the result. Any help is appreciated.
other implementations
=====================
The following results are not really comparable, because they avoid contructing a hashmap,
or run on multiple threads, or both. it is expected that they run faster, so they are reported
here for completeness.
cuda (7.5) 4 ms *
opencl (1.2 on CUDA 7.5) 5 ms *
nim optimized (0.10.3) 68 ms **
openmp (4 threads) 84ms
openmp (2 threads) 88ms
openmp (1 thread) 151ms
chapel (1.10) 1564 ms ***
scala-native (0.1-SNAPSHOT) 5016 ms ****
* CUDA: using a GPU Nvidia GeForce GTX TITAN X with 3072 CUDA cores.
** single-threaded; avoids the square root in the distance and accumulates the sum of points near a centroid, rather than putting them into a data structure. It is more of a baseline than a fair comparison
*** Chapel runs by default on two cores, I am not sure how to benchmark a single-thread version.
**** Scala native does not have hashmaps yet
expected result
===============
To check that an implementation is correct, one can print the list of
centroids just before the last iteration. The expected list (checked
across all languages) is:
(1.0084564757347625,2.2868123889219047)
(1.5309869001400929,0.7852174204702566)
(1.6894738051930507,1.7278381134195009)
(2.47790984305693,1.945630722483613)
(2.316742530156974,2.8586899252009443)
(1.4688362327217774,0.2078953628686833)
(2.2019938378105004,1.3767916116287988)
(0.8322035175020596,1.6266582764165047)
(2.035067805355936,0.36068184317747537)
(1.918441639829494,2.2623855839482294)