For this assignment I wrote the python package LanguageModel, code documentation and explanation is included as docstrings inside the code. I put my particular coding and design choices in an md cell with the heading Coding Decisions. I am using the Maltese [1] corpus dataset for this assignment and python version 3.7.
I have also included an html file generated by jupyter notebooks and I recommend viewing that instead of using the jupyter server. Alternatively I used the Jetbrains Pycharm IDE which also renders the md components neatly.
Included is a requirements.txt which includes the external libraries used in this assignment. To install the libraries with pip you can use this command:
sudo pip install -r requirements.txt
Omit sudo
if you are using Windows.
The file structure is as follows
Building a Language Model
|
+--Language Model
| |
| +-- __init__.py
| +-- Corpus.py
| +-- NGramCounts.py
| +-- NGRamModel.py
+--Maltese
| |
| +-- various txt files (Not included in git/submission)
+--Religion
| |
| +-- two txt files (Not included in git/submission)
+--Sports
| |
| +-- two txt files (Not included in git/submission)
+--Test Corpus
| |
| +-- Test.txt
+--.gitignore
+--README.md
+--Building a Language Model.ipynb
+--Building a Language Model.html
+--Building a Language Model.pdf
+--Plagiarism form.pdf
+--requirements.txt
This project has also been uploaded to git on: https://github.com/AidenWilliams/Building-a-Language-Model