The Food Standards Agency (FSA) is a non-ministerial government body that is responsible for food safety and hygiene in England, Wales and Northern Ireland. The FSA gathers information on public food practices by conducting surveys. The Food and You Survey (F&Y) is a flagship biennial study that explores the public's attitudes, knowledge and behaviour relating to food safety and production. Data is analysed and used to compile publicly available reports.
Wales is demographically distinct to the rest of the UK, being less populous, more deprived, with a larger rural population. For this reason, the FSA was particularly concerned that its F&Y engagement activities do not reach some of the demographic groups in Wales, who may have associated food risks. The following specific questions were posed:
- Does F&Y survey sampling reflect the true demographic profile of Wales?
- What food risks are associated with undersampled groups, and what is their understanding of food labelling?
- Are there any relationships between behaviours related to food safety, and can we predict food risk for specific groups or individuals?
James Doherty - Personal GitHub
Lorena Garcia-Perez - Personal GitHub
Charlie Jeynes - Personal GitHub
Mishka Nemes - Personal GitHub
Science to Data Science Virtual - 23rd of March to 24th of April 2020
Hosted and organised by Pivigo
Food and You (F&Y) survey 2010-2018: Food and You Waves 1-5 Data, csv file, dated on 09 September 2019
Food and You (F&Y) survey guide: Select Wave 1 to 5 - Data user guide
Census 2011 - microdata with individual entries: For data download (isg_regionv2.csv), an account needs to be created here
- includes everything required to run the dashboard (the
.ipynb
file, together with a requirements.txt and the F&Y survey.csv
data files). Instructions about which settings to use to run the app from the https://mybinder.org/ website are provided in a.png
file. In order to have this app available for anyone online, the relevant files must be localted in a public repository.
microdata_census2011_Wales_prepared.csv
has all the relevant demographic data from the 2011 Census. This includes only the Wales entries for 7 out of 120 original demographics.survey.csv
includes F&Y survey data from waves 1-5, for Wales, England and Northern Irelandsurvey_guide_values.csv
includes data to be parsed in the dictionary that translates answers namessurvey_guide_variables.csv
includes data to be parsed in the dictionary that translates question names
S2DS-2020_FSA_Wales_presentation.pdf
- project presentation for the S2DS programme, presented on 23rd of April 2020S2DS-2020_FSA_Wales_case_study.pdf
- executive summary of the challenge, approach, findings, impact and recommendations
- Graphs within the
.pdf
files starting withfoodBehaviour_
can also be visuialized in the dashboard provided here. A greater variety of graphs related to food behaviours by demographics can be visualized on demand by using the dahboard. - All other
.pdf
files contain graphs that can be plotted only by running the notebook. These are exhaustive in light of the current data.
masterscript_with_markdown.ipynb
includes all the code developed for the project. For details, see below.
Data loading, data wrangling and data analysis are carried out in the notebooks/masterscript_with_markdown
F&Y survey - values encoded as 'Not applicable' or 'Not known' were encoded as NaN
- apart from the principle component analysis where the data was kept in its original state.
Census - given the higher granularity of the data, data was aggregated to reflect the answer labels in F&Y in order to allow direct comparison. There were no missing values as all demographics were provided for each respondent.
Two dictionaries were built. The first one translates question names from their short version to their longer, comprehensible, version. The second is a nested dictionary that translates individual answers to each question from their numeric code to a meaningful answer. Their input data is provided in this repository.
custom_barplots
is a custom plotting function that outputs horizontal barplots with the percentage of people giving a certain answer, and 95% confidence intervals error bars. The names for each of the relevant questions and answers are displayed automatically for each plot title, axis labels and legend thanks to the use of the two dictionaries built.
custom_lineplots
is a custom plotting function that ouputs lineplots showing the temporal evolution of the F&Y survey demographics, for Wales, England and Northern Ireland (whose results appear side-by-side, for ease of comparison between the trends for these UK countries). 95% confidence intervals error bars are also displayed, together with the number of respondents (n) and the specific percentage represented by each category. The names for each of the relevant questions and answers are displayed automatically for each plot title, axis labels and legend thanks to the use of the two dictionaries built.
Principal Component Analysis (PCA) is used to explore the raw data in order to understand global patterns present within the whole F&Y dataset for Wales.
A timeline of the evolution of the F&Y survey demographics is plotted using custom_lineplots
.
Demographic variables (age, gender, marital status, religion, health status, work status, deprivation) are compared between the F&Y survey and the census using custom_barplots
.
Demographic variables are also taken into consideration for the analysis of questions of interest related to food safety, using the F&Y survey data and custom_barplots
.
To evaluate the significance of the differences under study, chi square
statistical testing is carried out (being the survey and census datasets non-parametric).
Correlation analysis is performed on the F&Y survey data to identify which features (questions and their respective answers) correlate the most, positevely or negatively, with the risk of suffering food poisoning. A preliminary precitive model is also developed, which informs again about the set of the features most relevant for determining food poisoning risk. To fully assess the predictive capability of this model, further work is required.
The dashboard components can be executed within the notebook, where further instructions are included.
In order to access the dashboard online:
- go to Binder
- select GitHub under GitHub repository name or URL and insert the appropiate path name (for example,
my-repository/app
) for the remote open repository where the dashboard code is hosted (such as theapp
folder in here). - select URL under Path to a notebook file (optional) and insert
voila/render/Food-and-You-survey_risks.ipynb