results

setup
=====

single thread
100000 points
10 clusters
15 iterations
average over 100 repetitions

machine info
============

All tests have been run on my Sony Vaio Z laptop with the following spec:

cpu:

* Processor name    : Intel(R) Core(TM) i7-2620M
* Packages(sockets) : 1
* Cores             : 2
* Processors(CPUs)  : 4
* Cores per package : 2
* Threads per core  : 2

memory: 8 GB

OS: Ubuntu 14.10


results
=======

nim (0.13.1) 148 ms
crystal (0.21.1) 167ms
java (1.7.0_76) 183 ms
kotlin (1.1-M01) 183 ms
c++ (g++ 4.9.2) 187 ms
rust (1.13.0) 196 ms
common lisp (sbcl 1.1.14) 204 ms
c (gcc 4.8.2) 218 ms
pony (0.2.1-531-g2cf6b89) 228 ms
julia (0.4.0-dev+3052) 266 ms
scala (2.11.7) 435 ms
pypy (2.5.0) 563 ms
java8 (1.8.0_31) 565 ms
luajit (2.0.2) 611 ms
ocaml (4.02.1) 796 ms
go (1.6.2) 971 ms
x10-c++ (2.5.2) 1436 ms
haskell (ghc 7.8.4) 1663 ms *
pharo (5) 2402 ms
x10-java (2.5.2) 2720 ms
factor (0.97) 2895 ms
clojure (1.7.0) 3511 ms
node (5.3.0) 3871 ms
elixir (1.0.3) 3949 ms
stanza (0.9.6) 4319 ms
erlang (R17) 4536 ms
scala-js (0.6.13 on Node 5.3.0) 4945 ms
io.js (1.4.3) 5241 ms
d (2.066.1) 5403 ms
lua (5.2.3) 6946 ms
ocaml bytecode (4.02.1) 8021 ms
python (2.7.6) 10632 ms
swift (2.2-dev) 11391 ms
perl (5.20.2) 15680 ms
rubinius (2.5.2) 20878 ms
ruby (1.9.3p484) 24819 ms

* I am not able to run this 100 times without the runtime caching the result. Any help is appreciated.

other implementations
=====================

The following results are not really comparable, because they avoid contructing a hashmap,
or run on multiple threads, or both. it is expected that they run faster, so they are reported
here for completeness.


cuda (7.5) 4 ms *
opencl (1.2 on CUDA 7.5) 5 ms *
nim optimized (0.10.3) 68 ms **
openmp (4 threads) 84ms
openmp (2 threads) 88ms
openmp (1 thread) 151ms
chapel (1.10) 1564 ms ***
scala-native (0.1-SNAPSHOT) 5016 ms ****

* CUDA: using a GPU Nvidia GeForce GTX TITAN X with 3072 CUDA cores.
** single-threaded; avoids the square root in the distance and accumulates the sum of points near a centroid, rather than putting them into a data structure. It is more of a baseline than a fair comparison
*** Chapel runs by default on two cores, I am not sure how to benchmark a single-thread version.
**** Scala native does not have hashmaps yet

expected result
===============

To check that an implementation is correct, one can print the list of
centroids just before the last iteration. The expected list (checked
across all languages) is:

(1.0084564757347625,2.2868123889219047)
(1.5309869001400929,0.7852174204702566)
(1.6894738051930507,1.7278381134195009)
(2.47790984305693,1.945630722483613)
(2.316742530156974,2.8586899252009443)
(1.4688362327217774,0.2078953628686833)
(2.2019938378105004,1.3767916116287988)
(0.8322035175020596,1.6266582764165047)
(2.035067805355936,0.36068184317747537)
(1.918441639829494,2.2623855839482294)