Qiaolin Chen, Ph.D. Aug 2017
(from https://www.kaggle.com/c/predict-west-nile-virus) West Nile virus (WNV) is most commonly spread to humans through infected mosquitoes. Around 20% of people who become infected with the virus develop symptoms ranging from a persistent fever, to serious neurological illnesses that can result in death.
In 2002, the first human cases of WNV were reported in Chicago. By 2004 the City of Chicago and the Chicago Department of Public Health (CDPH) had established a comprehensive surveillance and control program that is still in effect today.
Every week from late spring through the fall, mosquitoes in traps across the city are tested for the virus. The results of these tests influence when and where the city will spray airborne pesticides to control adult mosquito populations.
Given weather, location, testing, and spraying data, this competition asks you to predict when and where different species of mosquitos will test positive for WNV. A more accurate method of predicting outbreaks of WNV in mosquitoes will help the City of Chicago and CDPH more efficiently and effectively allocate resources towards preventing transmission of this potentially deadly virus.
The relevant data for this challenge can be downloaded here: https://www.kaggle.com/c/predict-west-nile-virus/data
This exercise requires you to clean the dataset, engineer features, develop one or more predictive models, and present your results. We would like you to orient your analysis around one or more of the following:
- Be prepared to discuss how predictive modeling could be used to inform actions that the City of Chicago might take. Does your model suggest any preventative actions?
- If we are interested in understanding the factors that are related to / predictive of the spread of WNV, what modeling choices / evaluation metrics would be most appropriate?
- What additional data might be relevant for understanding this problem?
- Are there any lessons for future data collection efforts?
- What weather conditions would we expect to impact mosquito populations? How would you investigate this using this data?
Here I explore the data, construct and/or select features, and use those features in one or more predictive models. I follow an iterative approach, establishing an initial baseline of performance upon which subsequent efforts can build.