This is a code for EMNLP-IJCNLP 2019 AnnoNLP Workshop Paper: "Computer Assisted Annotation of Tension Development in TED Talks through Crowdsourcing"
An annotation tool used in the paper to annotate the tension development.
- Install and run Mongodb.
-
To connect the Mongodb, make your own config.py:
cp config.sample.py config.py
- If the default setting of the Mongodb has not been changed, you don't need to modify the config.py
-
Install python requirements:
pip install -r requirements.txt
-
Download TED talks videos on the data/video_list.csv:
python downloader.py
-
Run the web-based annotation tool:
export PYTHONPATH=.; python annotation/app.py
-
Annotate! 😵
- Click one of the given options on each video clips. The selected value will be saved to DB automatically with the sentential information.
- We provided the annotators with this guideline document when using Amazon Mechanical Turk.
-
Export the annotation data:
export PYTHONPATH=.; python annotation/dbscript.py --run=export_data
- The exported data path: data/output/tension.json
- Example:
{ "doc_id": "5db7ac1c88e6da63a07a9c2e", "doc_title": "The power of vulnerability", "source": "https://www.youtube.com/watch?v=iCvmsMzlF7o", "sent_id": "5db7ac3388e6da63a07a9c6a", "sent_index": 60, "text": "And it turned out to be shame.", "labels": [1, 1, 1], "start_ts": 279364, "end_ts": 281543 } ...