Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 1.06 KB

README.md

File metadata and controls

19 lines (12 loc) · 1.06 KB

TF-iDF

Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. Reference

How to run

  • For run this software is necessary a files database (use the archives paste to this).
  • Add in file "forRead.txt" all files links that you want read. For this work, run the script "read.py".
  • Modify the parameters to generate the links correctly.
  • Open the code in a IDE Java as Maven project
  • Run the file App.java in path src/main/java/bigdata/TFidF as a JavaApplication

Concurrent Techniques

Mutex

Semaphore

Fork Join