Skip to content

talhasaruhan/fuzzy-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fuzzy String Search

Implementations of 3 data structures that help to accelerate approximate string matching.

JS Implementations

BK-Tree

  • BKTree(function metric, string[] items) : Creates the tree with given items and metric function (Levenshtein or Hamming)
  • add(string S) : Inserts given string S into the tree
  • delete(string S) : Marks the string S as deleted
  • query(string Q, int T) : Returns strings within T distance of given string Q
  • entries() : Returns entries inside the tree
  • has(string S, int T=0) : Checks for a string in range T of S, equivalent to query(S, tol).length > 0, terminates at first match
  • print() : Prints the tree
  • merge(BKTree T) : Imports entries from another BKTree
  • toJSON() : Converts tree into JSON format
  • parse(JSON J) : Parses given JSON data, overrides current tree

C++ Implementations

BK-Tree

  • bk_tree() : Constructs empty BK-Tree
  • bk_tree(string S) : Constructs the tree with S as root node
  • bk_tree(vector S) : Constructs the tree with given dictionary
  • void insert(string S) : Inserts the given string into the tree
  • void remove(string S) : Marks the string as deleted, and rebuilds the tree after number of deleted items exceeds a point.
  • vector search(string Q, int T) : Searches the tree for strings within T distance from given string Q.
  • has(string S, int T=0) : Checks for a string in range T of S, equivalent to search(S, tol).length > 0, terminates at first match
  • int capacity() : Returns the size of the tree in bytes
  • (static) int levenshtein_distance(string S, string R) : Returns the Levenshtein Distance of two strings
  • void print() : Prints the tree in the following format
hell, 0
	yelp, 2
	D*help, 1
		yell, 2
  • Provides iterator to it's elements(insert order) and allows C++ 11 style range based for loops
bk_tree t(words);
for(string& s : t)
  s;

Trie

  • trie() : Constructs empty Trie.
  • void insert(string S) : Inserts string into Trie
  • void search(string Q, int T) : Searches for string Q within tolerance of T, uses Levenshtein Distance,
  • void remove(string S) : Removes string S from the Trie
  • int capacity() : Returns the size of the Trie in bytes
  • void print() : Prints the Trie in the following format
h
 a
  m*
 e
  l
   l
    o*
   p*
    i
     n
      g*

VP-Tree

  • vp_tree(vector S) : Constructs the tree with given set of strings
  • vector search(string Q, int T) : Searches the tree for strings within T distance from given string Q.
  • (static) int levenshtein_distance(string S, string R) : Returns the Levenshtein Distance of two strings
  • void print() : Prints the tree in the following format
hell
 help
 yell
  yelp
  • Provides iterator to it's elements and allows C++ 11 style range based for loops
vp_tree t(words);
for(string& s : t)
  s;

About

Data structures for fuzzy string search

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published