You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A banded version of the Levenshtein distance algorithm should be implemented as described in https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.1245&rep=rep1&type=pdf.
This would reduce the runtime of string_metric.levenshtein from O(N/64 * M) to O(score_cutoff/64 * M which could significantly improve the performance on long sequences. This could be used the reduce the memory usage of string_metric.levenshtein_editops as well, since it is only require to store the relevant band of the Levenshtein matrix instead of storing the whole matrix. E.g. for the case described in qurator-spk/dinglehopper#62 this could reduce the memory usage from around 110k**2 * 3bit = 4.2GB to 110k * 4k * 3bit = 160MB.
The text was updated successfully, but these errors were encountered:
This improves memory usage by 33% from len(s1) * len(s2) * 3 bit to len(s1) * len(s2) * 2 bit. In addition this makes the algorithm around 10-20% faster.
A banded version of the Levenshtein distance algorithm should be implemented as described in https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.1245&rep=rep1&type=pdf.
This would reduce the runtime of
string_metric.levenshtein
fromO(N/64 * M)
toO(score_cutoff/64 * M
which could significantly improve the performance on long sequences. This could be used the reduce the memory usage ofstring_metric.levenshtein_editops
as well, since it is only require to store the relevant band of the Levenshtein matrix instead of storing the whole matrix. E.g. for the case described in qurator-spk/dinglehopper#62 this could reduce the memory usage from around110k**2 * 3bit = 4.2GB
to110k * 4k * 3bit = 160MB
.The text was updated successfully, but these errors were encountered: