Skip to content

An Elixir wrapper around the Rust strsim crate with rustler.

License

Notifications You must be signed in to change notification settings

joshrotenberg/strsim_ex

Repository files navigation

Strsim

Hex pm Elixir CI License Hex.pm Coverage Status

Strsim is an Elixir wrapper for the Rust strsim crate with Rustler.

Summary

Strsim is a NIF-based bridge for the strsim Rust library which implements the following string similarity algorithms:

  • Levenshtein
  • Damerau-Levensthein
  • Jaro
  • Jaro-Winkler
  • Hamming
  • Optimal String Alignment
  • Sørensen–Dice

The crate offers several functions for both strings and generic sequences, and this library exposes all of them except for the generic Damerau-Levenshtein for now.

Usage

All of the functions in the crate have equivalent Elixir functions:

iex(1)> Strsim.damerau_levenshtein("ab", "bca")
{:ok, 2}

iex(2)> Strsim.generic_hamming([1, 2], [1, 3])
{:ok, 1}

iex(3)> Strsim.generic_jaro([1, 2], [1, 3, 4])
{:ok, 0.611111111111111}

iex(4)> Strsim.generic_jaro_winkler([1, 2], [1, 3, 4])
{:ok, 0.6499999999999999}

iex(5)> Strsim.generic_levenshtein([1, 2, 3], [1, 2, 3, 4, 5, 6])
{:ok, 3}

iex(6)> Strsim.hamming("hamming", "hammers")
{:ok, 3}

iex(7)> Strsim.hamming("hamming", "ham")
{:error, :different_length_args}

iex(8)> Strsim.jaro("Friedrich Nietzsche", "Jean-Paul Sartre")
{:ok, 0.39188596491228067}

iex(9)> Strsim.jaro_winkler("cheeseburger", "cheese fries")
{:ok, 0.9111111111111111}

iex(10)> Strsim.levenshtein("kitten", "sitting")
{:ok, 3}

iex(11)> Strsim.normalized_damerau_levenshtein("levenshtein", "löwenbräu")
{:ok, 0.2727272727272727}

iex(12)> Strsim.normalized_levenshtein("kitten", "sitting")
{:ok, 0.5714285714285714}

iex(13)> Strsim.osa_distance("ab", "bca")
{:ok, 3}

iex(14)> Strsim.sorensen_dice("ferris", "feris")
{:ok, 0.8888888888888888}

Benchmarks

Everybody loves benchmarks. There are results for all implemented strsim as well as jaro, jaro_winkler, levenshtein and hamming comparing the Rust and various Elixir implementations.

To run the benchmarks:

# run Elixir vs Rust Jaro benchmarks
$ MIX_ENV=bench mix bench.jaro 

# run Elixir vs Rust Jaro-Winkler benchmarks
$ MIX_ENV=bench mix bench.jaro_winkler 

# run Elixir vs Rust levensthein benchmarks
$ MIX_ENV=bench mix bench.levenshtein

# run Elixir vs Rust hamming benchmarks
$ MIX_ENV=bench mix bench.hamming

# run a benchmark will all of the Rust functions
$ MIX_ENV=bench mix bench.strsim

# run 'em all
$ MIX_ENV=bench mix bench.all

See also

Installation

The package can be installed by adding strsim to your list of dependencies in mix.exs:

def deps do
  [
    {:strsim, "~> 0.1.1"}
  ]
end

Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/strsim.