Title: Identifying Discrepancies within current transport initiatives: An Urban Informatics Approach
This repository contains the scripts that I have used for my methodology.
In sequential order, the scripts should be executed as follow:
- reddit.py and twitter.py
- topic_modelling.py
- Visualisation_Stationarity.py
Following diagram showcases the methodology flow (for pure scripting), as well as, the output and their file names after each run.
Disclaimer 1: Raw data will not be supplemented in this repository to prevent breach of privacy. Refer to Appendix C to understand data schema of the raw and processed data.
Disclaimer 2: reddit.py and twitter.py contain environmental variables that users need to change on their own end.
Disclaimer 3: I have also attached the script (reddit_locations.py) for identifying locations with Reddit commments through spaCy Named Entity Recognition (NER). However, these results are not utilised for subsequent analysis as posited in section 3.7. Moreover, the trained NER model (spacy_sg) may not be the best identifier of Singapore's locations in lieu of vaa myriad of reasons that are not within the scope of this thesis.
- Git clone the entire package using whatever CLI you are comfortable with
-
pip install -r requirements.txt
- Run the scripts sequentially