Skip to content

Python implementation of the Pass-join algorithm

License

Notifications You must be signed in to change notification settings

mapado/passjoin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Passjoin

Python implementation of the Pass-join index.

This index allows to efficiently query similar words within a distance threshold.

The implementation is based on this paper and the existing Javascript implementation in the mnemoist package (link).

Installation

$ pip install passjoin

Usage

Index creation

from passjoin import Passjoin
from Levenshtein import distance  # or any string distance function

max_edit_distance = 1  # maximum edit distance for retrieval
corpus = ['pierre', 'pierr', 'jean', 'jeanne']

passjoin_index = Passjoin(corpus, max_edit_distance, distance)

Index querying

passjoin_index.get_word_variations('pierre')
>> {'pierre', 'pierr'}

passjoin_index.get_word_variations('jeann')
>> {'jean', 'jeanne'}

passjoin_index.get_word_variations('jeanine')
>> {'jeanne'}

Contributing

Clone the project.

Install pipenv.

Run pipenv install --dev

Launch test with pipenv run pytest

About

Python implementation of the Pass-join algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published