Skip to content

Latest commit

 

History

History
39 lines (25 loc) · 1.64 KB

README.md

File metadata and controls

39 lines (25 loc) · 1.64 KB

This is a source code for the article Elixir vs Ruby: File I/O performance that you can find on the Phoenix on Rails blog. It's basically a sample text file processing script implemented in Elixir and Ruby that does the following:

  1. Loads the input CSV, line by line.
  2. Parses first column which is of format Some text N.
  3. Leaves only those lines where N is dividable by 2 or 5.
  4. Saves those filtered, but unchanged lines into another CSV.

It does so both in a streaming manner, which is slower but works with all file sizes, and as a faster but less secure and less universal one-shot read.

Disclaimer: I'm not after proving that either Elixir or Ruby is "better" at reading files. This is just an exercise to better understand the practical consequences of running simple command-line script via MRI vs running it in a complex Erlang VM environment.

Generating samples

You can generate sample CSV file of given size, compilant with the algorithm, like this:

ruby lib/generate.rb sample-500k.csv 500000

The syntax is: ruby lib/generate.rb <filename> <rows> [<cols>] where <cols> default to 3.

Running benchmarks

Elixir version:

MIX_ENV=prod mix escript.build
time ./process_csv sample-500k.csv [read | stream]

Ruby version:

time ruby lib/process_csv.rb sample-500k.csv [read | stream]

Improvements

Please look into the article to see which optimizations I've tried. Open Pull Request if you've found a better way.