Skip to content

To explore and evaluate the application of crowdsourcing, in general, and AMT, in specific, for developing digital public health surveillance systems, we collected 296,166 crowd-generated labels for 98,722 tweets, labelled by 610 AMT workers, to develop machine learning (ML) models for detecting behaviours related to physical activity, sedentary…

License

Notifications You must be signed in to change notification settings

data-intelligence-for-health-lab/CrowdSourcing-for-Digital-Public-Health-Surveillance

Repository files navigation

CrowSourcing-for-Digital-Public-Health-Surveillance

The main objective of this study is to explore and evaluate the application of crowdsourcing, in general, and AMT, in specific, for developing digital public health surveillance systems.

Dataset and Labels

A full version of the dataset used for this study is available at the dataset's Github repository. As sample dataset for replicating the results of this study is provided in this repository, under the 'sample datase' folder. The following figure presents the labels defined for each of the binary and multi-class classification tasks.

Tasks and Labelling process

The following figure resents a a sample labelling task (i.e., HIT) for the sedentary behaviour category. Each HIT contains four questions (section 1), and each asks if the presented tweet is a self-reported PASS-related behaviour (section 2). The fourth question is a pre-defined qualification question that was designed in addition to the qualification requirements defined by AMT (section 3). The answer to this question was always choice#1, and it was easy enough to detect spammers or irresponsible workers. Also, each HIT contains an illustrative example that explains each choice of the questions. Workers were asked to select exactly one choice, and HITs with zero or more than one label were rejected during the approval process.

Active Learning and SHAP Analysis

Please check the TranditionalModels notebook for more details about the implementation of these techniques.

Citation

The manuscript that presents this study has been accepted for publication at the Journal of Medical Internet Research (JMIR). You can cite our paper as follows:

@article{abad2022crowdsourcing,
  title={Crowdsourcing for machine learning in public health surveillance: lessons learned from Amazon Mechanical Turk},
  author={Abad, Zahra Shakeri Hossein and Butler, Gregory P and Thompson, Wendy and Lee, Joon and others},
  journal={Journal of medical Internet research},
  volume={24},
  number={1},
  pages={e28749},
  year={2022},
  publisher={JMIR Publications Inc., Toronto, Canada}
}

More Questions

Please use issues on this Github for any questions or feedback. You can also contact us at dih[at]ucalgary.ca or joonwu.lee[at]ucalagry.ca for specific inquiries.

About

To explore and evaluate the application of crowdsourcing, in general, and AMT, in specific, for developing digital public health surveillance systems, we collected 296,166 crowd-generated labels for 98,722 tweets, labelled by 610 AMT workers, to develop machine learning (ML) models for detecting behaviours related to physical activity, sedentary…

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published