We create bigram, trigram and linear interpolation language models which are used for language generation and spell correction.
We create deep learning models using the Transformers\Datasets
, Pytorch
and Tensorflow
libraries.
We also use the keras_tuner
/ transformers_trainer
frameworks to optimize hyperparameters and model architecture.
We briefly mention additional tasks carried out:
- Sentiment Analysis: Dataset selection, exploratory analysis, custom stopwords, data augmentation.
- POS Taggging: Dataset selection, exploratory analysis, custom parsing, custom baseline ("smart dummy") model, local caching of heavy computations, automated results generation (python -> LaTeX).
Each task features two IPython notebooks containing the executed code, python source files for repeated custom tasks and a unified report.
The reports discuss in detail the design decisions for each classifier and include graphs and aggregated results comparing the current model to the previous models.
Sentiment classification POS Tagging Report
Sentiment classification POS Tagging Report