tf-idf-algorithm

Implementation of the tf-idf algorithm to calculate the similarity between documents using cosine similarity.

Faculty Of Computer And Artificial Intelligence Cairo University `FCAI-CU`

Information Retrieval Assignment

The TfIdf class contains methods to calculate term frequency (tf) and inverse document frequency (idf). The tfCalculator method takes an array of all the words in a document and a term to check as input and returns the term frequency of the term in the document. The idfCalculator method takes a list of arrays, each containing all the words in a document and a term to check as input and returns the inverse document frequency score of the term across all the documents.

The CosineSimilarity class has a method cosineSimilarity to calculate the cosine similarity between two document vectors. It takes two document vectors as input, calculates the dot product, magnitude of each vector, and returns the cosine similarity score.

The DocumentParser class has methods to read files, tokenize the documents into terms, and create a term frequency-inverse document frequency (tf-idf) vector for each document. The parseFiles method reads all the files in a given folder and stores the terms of each document in an array. The tfIdfCalculator method uses the TfIdf class to calculate the tf-idf score for each term in each document and stores the document vectors in a list. The getCosineSimilarity method calculates the cosine similarity between all pairs of documents and prints the results.

Overall, this code can be used to calculate the similarity between a set of documents based on the words they contain. It is commonly used in information retrieval and text mining tasks.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
out/production/assignment4		out/production/assignment4
src		src
100.txt		100.txt
101.txt		101.txt
102.txt		102.txt
103.txt		103.txt
104.txt		104.txt
105.txt		105.txt
106.txt		106.txt
107.txt		107.txt
108.txt		108.txt
109.txt		109.txt
300.txt		300.txt
302.txt		302.txt
500.txt		500.txt
501.txt		501.txt
502.txt		502.txt
503.txt		503.txt
504.txt		504.txt
505.txt		505.txt
506.txt		506.txt
507.txt		507.txt
508.txt		508.txt
509.txt		509.txt
510.txt		510.txt
511.txt		511.txt
512.txt		512.txt
513.txt		513.txt
514.txt		514.txt
515.txt		515.txt
516.txt		516.txt
517.txt		517.txt
518.txt		518.txt
519.txt		519.txt
520.txt		520.txt
521.txt		521.txt
522.txt		522.txt
523.txt		523.txt
524.txt		524.txt
525.txt		525.txt
526.txt		526.txt
527.txt		527.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tf-idf-algorithm

Faculty Of Computer And Artificial Intelligence Cairo University `FCAI-CU`

Information Retrieval Assignment

About

Releases

Packages

Languages

abdo-essam/tf-idf-algorithm

Folders and files

Latest commit

History

Repository files navigation

tf-idf-algorithm

Faculty Of Computer And Artificial Intelligence Cairo University FCAI-CU

Information Retrieval Assignment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Faculty Of Computer And Artificial Intelligence Cairo University `FCAI-CU`

Packages