LSH from zero 🦾 native Map-Reduce in PySpark 🚀
-
Updated
Nov 1, 2022 - Jupyter Notebook
LSH from zero 🦾 native Map-Reduce in PySpark 🚀
Minhash text analyzer developed during Algorithmics subject.
ETH Zurich Fall 2017
SpellChecker: an application to check for spell errors.
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome
An implementation of the MinHashing algorithm in C using POSIX threads.
First homework for the Advance Data Mining course
Probability Methods for Informatics Engineering | UA 2018/2019
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Implementing Locality Sensitive Hashing for DNA Sequences.
Deduplication : minhash w/ LSH
Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome
Textual data manipulation projects with applications of advanced data mining techniques: recommendation systems, information retrieval systems, search engines, latent sentiment analysis, pagerank, PCA.
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
Finding Similar Pairs using PySpark
Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling
Add a description, image, and links to the minhash-lsh-algorithm topic page so that developers can more easily learn about it.
To associate your repository with the minhash-lsh-algorithm topic, visit your repo's landing page and select "manage topics."