Skip to content

Code and models used in my Bachelor’s Degree Thesis about large text similarity measures are here. The similarities have been combined with machine learning based embeddings. This repository also contains raw results obtained from tasks/experiments.

License

Notifications You must be signed in to change notification settings

joaquimgomez/BachelorsThesis-TextSimilarityMeasures

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis and Comparison of Text Similarity Measures

This is the repository for the Barchelor's Degree Thesis/Project carried out during the 2020/21 Winter Semester by Joaquim Gómez Sanchez.

The thesis/project is available in: UPC's Repository.

Code and Models

This repository contains all the code developed for the thesis, as well as the raw results obtained and the trained and used models. Following, the code not implemented by the author and the pretrained models used are referenced.

Code:

  • Code for training GloVe. Obtained from the official repository, mantained by model's authors.
  • Code for computing Normalized Relative Compression distance. Provided by the thesis' director, who got it from Armando J. Pinho.

Models:

Data

Regarding the data, it has been decided not to publish anything in order to avoid legal problems. The data used for the experiments are listed in the thesis and can be obtained from the UPC's (Universitat Politècnica de Catalunya) repository or from other papers' repositories. The data used for training the models has been completely collected from UPC's repository.

In case you are interested in knowing about the training data, the preprocessed files or the experiments files elaborated, send me an e-mail.

About

Code and models used in my Bachelor’s Degree Thesis about large text similarity measures are here. The similarities have been combined with machine learning based embeddings. This repository also contains raw results obtained from tasks/experiments.

Topics

Resources

License

Stars

Watchers

Forks