This is a source code for the article Elixir vs Ruby: File I/O performance that you can find on the Phoenix on Rails blog. It's basically a sample text file processing script implemented in Elixir and Ruby that does the following:
- Loads the input CSV, line by line.
- Parses first column which is of format Some text N.
- Leaves only those lines where N is dividable by 2 or 5.
- Saves those filtered, but unchanged lines into another CSV.
It does so both in a streaming manner, which is slower but works with all file sizes, and as a faster but less secure and less universal one-shot read.
Disclaimer: I'm not after proving that either Elixir or Ruby is "better" at reading files. This is just an exercise to better understand the practical consequences of running simple command-line script via MRI vs running it in a complex Erlang VM environment.
You can generate sample CSV file of given size, compilant with the algorithm, like this:
ruby lib/generate.rb sample-500k.csv 500000
The syntax is: ruby lib/generate.rb <filename> <rows> [<cols>]
where <cols>
default to 3.
Elixir version:
MIX_ENV=prod mix escript.build
time ./process_csv sample-500k.csv [read | stream]
Ruby version:
time ruby lib/process_csv.rb sample-500k.csv [read | stream]
Please look into the article to see which optimizations I've tried. Open Pull Request if you've found a better way.