NLP and Unsupervised learning using Champions League Final 2018 Tweets

This repository forms my final project submission for the General Assembly Data Science Essentials course. I utilised NLP and KMeans to identify topics within a Twitter dataset.

The repository comprises of two jupyter notebooks and two output files:

eda-rea-v-liv-2018.ipynb - This contains the project brief (outline of the project and aspriations) along with some exploratory data analysis which provides insights to the dataset I selected.
nlp-rea-v-liv-2018.ipynb - This contains the final project report which includes NLP and KMeans. There are two .txt files that are outputs of some of the code written in this file, these were used to identify parameters that resulted in strong/clear clustering. This combination of parameters was then further explored and visualised as seen in the jupyter notebook.
1. 2020-09-03 1913-TF-IDF.txt - Contains the results of clustering data with TF-IDF features using KMeans and various parameters.
2. 2020-09-04 0651-COUNT-VEC.txt - Contains the results of clustering data with count vectorized features using KMeans and various parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Assets		Assets
Notebooks		Notebooks
Outputs		Outputs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP and Unsupervised learning using Champions League Final 2018 Tweets

About

Releases

Packages

Languages

jb-0/twitter-nlp

Folders and files

Latest commit

History

Repository files navigation

NLP and Unsupervised learning using Champions League Final 2018 Tweets

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages