This repository contains C++ classes CStringEmbedding
and CStringEmbeddings
for calculating string similarity using character embeddings.
The CStringEmbedding
class is responsible for creating embeddings for individual strings by counting the occurrences of characters. The CStringEmbeddings
class manages a collection of string embeddings and provides methods for finding similar strings based on their embeddings.
To use this package, you can clone the repository and compile it using a C++ compiler:
git clone https://github.com/NIR3X/StringSimilarity.cpp
cd StringSimilarity.cpp
make
To use these classes, follow these steps:
- Include the header file
StringSimilarity.h
in your project. - Create a vector of strings or provide a list of strings.
- Initialize a
CStringEmbeddings
object with the vector of strings. - Use the
FindSimilarsFor
method to find similar strings for a given input string.
Here's an example of how to use these classes:
#include "StringSimilarity.h"
#include <iostream>
int main() {
std::vector<std::string> words = {
"apple", "banana", "orange", "grape", "pineapple", "strawberry", "watermelon", "kiwi", "pear", "peach",
"blueberry", "raspberry", "blackberry", "melon", "cherry", "apricot", "mango", "papaya", "plum", "lemon"
};
CStringEmbeddings embeddings(words);
std::string wordToFindSimilarsFor = "apple";
for (const auto& [similar, distance] : embeddings.FindSimilarsFor(wordToFindSimilarsFor)) {
std::cout << "Similar: " << similar << ", Distance: " << distance << std::endl;
}
return 0;
}
This program is Free Software: You can use, study share and improve it at your will. Specifically you can redistribute and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.