Skip to content

Benchmark Daru::DataFrame Initialize method Number of rows = 10 ** n

Shekhar Prasad Rajak edited this page Jan 31, 2019 · 2 revisions

Using PR https://github.com/SciRuby/daru/pull/484 and commit

Number of rows = 10 ** n (where n = 2,3,4,5) Number of colmn = 2

$ ruby benchmarks/statistics/benchmark_dataframe_creation.rb

DataFrame of size : (10 ** 2, 2) 
Warming up --------------------------------------
 Using list of lists     3.578k i/100ms
Using list of Vector     1.764k i/100ms
Using list of Hashes     1.656k i/100ms
 Using Hash of lists     3.933k i/100ms
Using Hash of Vector     1.941k i/100ms
Calculating -------------------------------------
 Using list of lists     40.230k (± 2.7%) i/s -    203.946k in   5.073576s
Using list of Vector     17.787k (± 4.0%) i/s -     89.964k in   5.067222s
Using list of Hashes     16.667k (± 2.9%) i/s -     84.456k in   5.072032s
 Using Hash of lists     40.621k (± 2.3%) i/s -    204.516k in   5.037534s
Using Hash of Vector     19.917k (± 1.8%) i/s -    100.932k in   5.069364s

Comparison:
 Using Hash of lists:    40621.4 i/s
 Using list of lists:    40229.8 i/s - same-ish: difference falls within error
Using Hash of Vector:    19917.2 i/s - 2.04x  slower
Using list of Vector:    17786.8 i/s - 2.28x  slower
Using list of Hashes:    16666.9 i/s - 2.44x  slower


DataFrame of size : (10 ** 3, 2) 
Warming up --------------------------------------
 Using list of lists   842.000  i/100ms
Using list of Vector   333.000  i/100ms
Using list of Hashes   189.000  i/100ms
 Using Hash of lists   849.000  i/100ms
Using Hash of Vector   338.000  i/100ms
Calculating -------------------------------------
 Using list of lists      8.536k (± 1.7%) i/s -     42.942k in   5.032228s
Using list of Vector      3.343k (± 1.4%) i/s -     16.983k in   5.080608s
Using list of Hashes      1.869k (± 1.8%) i/s -      9.450k in   5.057532s
 Using Hash of lists      8.604k (± 2.0%) i/s -     43.299k in   5.034429s
Using Hash of Vector      3.419k (± 1.8%) i/s -     17.238k in   5.043370s

Comparison:
 Using Hash of lists:     8604.3 i/s
 Using list of lists:     8536.0 i/s - same-ish: difference falls within error
Using Hash of Vector:     3419.0 i/s - 2.52x  slower
Using list of Vector:     3343.4 i/s - 2.57x  slower
Using list of Hashes:     1869.1 i/s - 4.60x  slower


DataFrame of size : (10 ** 4, 2) 
Warming up --------------------------------------
 Using list of lists    97.000  i/100ms
Using list of Vector    34.000  i/100ms
Using list of Hashes    19.000  i/100ms
 Using Hash of lists    97.000  i/100ms
Using Hash of Vector    35.000  i/100ms
Calculating -------------------------------------
 Using list of lists    986.927  (± 2.8%) i/s -      4.947k in   5.017015s
Using list of Vector    354.545  (± 2.5%) i/s -      1.802k in   5.086230s
Using list of Hashes    198.206  (± 1.5%) i/s -      1.007k in   5.081611s
 Using Hash of lists    996.709  (± 1.7%) i/s -      5.044k in   5.062152s
Using Hash of Vector    354.834  (± 2.5%) i/s -      1.785k in   5.033881s

Comparison:
 Using Hash of lists:      996.7 i/s
 Using list of lists:      986.9 i/s - same-ish: difference falls within error
Using Hash of Vector:      354.8 i/s - 2.81x  slower
Using list of Vector:      354.5 i/s - 2.81x  slower
Using list of Hashes:      198.2 i/s - 5.03x  slower


DataFrame of size : (10 ** 5, 2) 
Warming up --------------------------------------
 Using list of lists     8.000  i/100ms
Using list of Vector     3.000  i/100ms
Using list of Hashes     1.000  i/100ms
 Using Hash of lists     8.000  i/100ms
Using Hash of Vector     3.000  i/100ms
Calculating -------------------------------------
 Using list of lists     89.012  (± 2.2%) i/s -    448.000  in   5.036742s
Using list of Vector     31.277  (± 6.4%) i/s -    156.000  in   5.005292s
Using list of Hashes     19.453  (± 5.1%) i/s -     98.000  in   5.042912s
 Using Hash of lists     89.202  (± 2.2%) i/s -    448.000  in   5.024835s
Using Hash of Vector     26.694  (±18.7%) i/s -    129.000  in   5.033636s

Comparison:
 Using Hash of lists:       89.2 i/s
 Using list of lists:       89.0 i/s - same-ish: difference falls within error
Using list of Vector:       31.3 i/s - 2.85x  slower
Using Hash of Vector:       26.7 i/s - 3.34x  slower
Using list of Hashes:       19.5 i/s - 4.59x  slower