Skip to content

compute Levenshtein edit distance with Wagner-Fischer algorithm

Notifications You must be signed in to change notification settings

andreiamatuni/levenshtein

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

levenshtein

Compute Levenshtein distances between strings using the Wagner-Fischer algorithm.

There are 4 functions, EditDistance() and BufferedEditDistance (and their compact counterparts). If you're going to be computing a lot of distances in a loop, you should use the buffered version to avoid all the allocation/deallocation costs. Buffered functions do not allocate. Here's the performance difference on my machine. Each run is a 1000 pair comparison:


BenchmarkUnbuffered-4                   	    3000	    552195 ns/op	  628480 B/op	    8120 allocs/op
BenchmarkBuffered-4                     	   10000	    161841 ns/op	       0 B/op	       0 allocs/op
BenchmarkUnbufferedCompact-4            	    5000	    237236 ns/op	  128000 B/op	    2000 allocs/op
BenchmarkBufferedCompact-4              	   10000	    172415 ns/op	       0 B/op	       0 allocs/op
BenchmarkBasePairsUnBuffered-4          	     200	   9482169 ns/op	12736000 B/op	   38000 allocs/op
BenchmarkBasePairsBuffered-4            	     200	   7109534 ns/op	      63 B/op	       0 allocs/op
BenchmarkBasePairsUnbufferedCompact-4   	     200	   6832891 ns/op	  640000 B/op	    2000 allocs/op
BenchmarkBasePairsBufferedCompact-4     	     300	   5796142 ns/op	       0 B/op	       0 allocs/op

The BasePair benchmarks use 36x36 character string comparisons (DNA/RNA base pairs). They highlight the benefit of the compact functions over building the whole distance matrix when inputs are large.

The compact versions just use the current and previous rows, so they run in O(m) memory rather than O(mn)

You set edit weights by passing in a Weights{} struct as a configuration parameter to the functions. Benchmarks were run with all weights set to 1.

About

compute Levenshtein edit distance with Wagner-Fischer algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages