Skip to content

Python scripts used to calculate 3 basic similarity measures, suitable for ad hoc information retrieval systems: Levenshtein Edit Distance, Jaccard, and a Term-Document matrix.

Notifications You must be signed in to change notification settings

crfmc/similarity-measures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

similarity-measures

This is a repository that contains 3 independent python scripts:

levenshtein.py

  • A program that uses a Numpy library to calculate the Levenshtein edit distance between two strings.

document_matrix.py

  • Calculates the document-term matrix from a collection of strings.

jaccard.py

  • Calculates the Jaccard similarity measure for two lists of strings.

About

Python scripts used to calculate 3 basic similarity measures, suitable for ad hoc information retrieval systems: Levenshtein Edit Distance, Jaccard, and a Term-Document matrix.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages