GitHub - faizann24/Authorship-Attribution: Authorship Attribution with Machine Learning

Authorship Attribution with Machine Learning

Authorship Attribution with Random Forests and TFIDF Scores

This repository contains code for the blog post Large Scale Authorship Attribution with Machine Learning. It uses a Random Forest model along with TFIDF scores as features to perform authorship classification among n number of authors.

Files Description

Path	Description
Authorship-Attribution	Main folder.
└ sample_data	Folder containing data for authors.
├ authors_folders	One folder for each author.
├ authors_article_0.txt	First article of the author.
├ authors_article_1.txt	Second article.
├ ... authors_article_n.txt	... Last article.
├ attribution_model.py	Authorship attribution model.

Usage

Packages

You will need to install the following package to run the authorship attribution model.

Scikit-learn

How to run

In order to run the model, please use the following command:

python3 attribution_model.py --articles_per_author 250 --authors_to_keep 5 --data_folder sample_data

The script takes three parameters as inputs:

articles_per_author: How many articles do you want to use per author. The range can be anywhere between [10-Maximum Number of Articles per any Author]
authors_to_keep: How many authors do you want in your attribution classifier. The range can be anywhere between [2-Total Authors]
data_folder: Data folder containing a single directory for each author.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
sample_data		sample_data
LICENSE		LICENSE
README.md		README.md
attribution_model.py		attribution_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Authorship Attribution with Machine Learning

Files Description

Usage

Packages

How to run

License

About

Releases

Packages

Languages

License

faizann24/Authorship-Attribution

Folders and files

Latest commit

History

Repository files navigation

Authorship Attribution with Machine Learning

Files Description

Usage

Packages

How to run

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages