Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

Global NIPS Paper Implementation Challenge

I implemented the paper based on the research methodology

Original Paper

https://arxiv.org/pdf/1412.7782.pdf

Main Goal

Develope an effective plagiarism detection tool for text based assignments by comparing unigram, bigram, and trigram of vector space model with cosine and jaccard similarity measure

Programming Tools

Python 2.7
scikit-learn
NLTK

Files

Several important files / directories:

main.py
Main file containing the whole source code
docs
A directory containing students answer. Each answer is stored in a document having specified file name, namely assignment_index. The word assignment is fixed and word index is an integer that will be incremented each time a new student is added
combined_docs
Each student answer will be combined into one document called MASTER Document. The detection processes will be done using this combined document

To Run

To run the program, execute the following command:

python main.py

Methodology

Combining students answer into one single answer file (MASTER DOCUMENT)
Extract unique words (unigram, bigram, trigram) from the MASTER DOCUMENT
Eliminate stopwords
Compute Document Frequency (DF) and Inverse Document Frequency (IDF) for each term
Compute TF-IDF Weight Vector for each document
Compare each pair of assignment using Cosine Similarity
Compare each pair of assignment using Jaccard Similarity

Albertus Kelvin
Bandung Institute of Technology

Code was developed on January 20th, 2018
Code was made publicly available on January 31st, 2018

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets/img		assets/img
docs		docs
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
combined_docs		combined_docs
main.py		main.py
nltk_en_stopwords		nltk_en_stopwords

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

Global NIPS Paper Implementation Challenge

Original Paper

Main Goal

Programming Tools

Files

To Run

Methodology

About

Releases

Packages

Languages

License

albertusk95/nips-challenge-plagiarism-detection-vsm

Folders and files

Latest commit

History

Repository files navigation

Plagiarism Detection on Electronic Text Based Assignments Using Vector Space Model (iciafs14)

Global NIPS Paper Implementation Challenge

Original Paper

Main Goal

Programming Tools

Files

To Run

Methodology

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages