The abstracts I downloaded are from Scopus database. Particularly, I focused on the journal called "Transport Policy." The abstracts of all papers that are published in Transport Policy journal from year 2017 to 2022. Among these papers, I kept only the papers that are related to transportation planning.
- Open the terminal (I used Conda).
- Type the following command: python -m spacy train config.cfg --output ./output --paths.train ./causal_training_data.spacy --paths.dev ./causal_dev_data.spacy
- Make sure the path is correct. If the command gives you any error, try giving the full path for causal_training_data.spacy and causal_dev_data.spacy.
- The above command will produce the trained model in the --output path you give.
- Load that model in the CausalPhrasesExtraction.py.
You will notice, I used both trained model and SpaCy's default model. It was because, the trained model did not have 'sentencizer', to split the abstracts into multiple sentences. So, I had to use the default SpaCy's model 'en_core_web_lg'.
If you have any questions, or suggestions please contact me. You can get my details here.