documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
-
Updated
Dec 24, 2019 - TeX
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
LSH from zero 🦾 native Map-Reduce in PySpark 🚀
A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala
Fast Jaccard similarity search for abstract sets (documents, products, users, etc.) using MinHashing and Locality Sensitve Hashing
Finding Similar Pairs using PySpark
An implementation of the MinHashing algorithm in C using POSIX threads.
Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling
Scalable Data Mining - Assignment submissions
ETH Zurich Fall 2017
First homework for the Advance Data Mining course
Probability Methods for Informatics Engineering | UA 2018/2019
similarity of the texts (Jaccard Similarity, Minhash, LSH)
SARS-COV-2 genome analysis using Big Data algorithms in order to find clusters of similar mutations that belongs to different clades which mutate together and generate the correspondent clade.
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome
Implementing Locality Sensitive Hashing for DNA Sequences.
MinHash Example
Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.
Add a description, image, and links to the minhash-lsh-algorithm topic page so that developers can more easily learn about it.
To associate your repository with the minhash-lsh-algorithm topic, visit your repo's landing page and select "manage topics."