The aim of this project is to show that the atmospheric CO2 concentration as a result of human-related activities is causing the average global temperature to rise. We have found evidence, from available data, to support this fact by showing that a correlation exists between temperature anomaly and CO2 concentration and that no such correlations exist with other climate-related measurements.
Once this evidence is established, our focus is turned to a predictive model of the temperature anomaly and CO2 concentration. We develop a simple model based on differential equations, and calculate the best-fitting parameters using a least-squares optimization algorithm on the sum of squared residuals between the model and the data. Finally, a Bayesian statistical approach is used to fit the model to the data and obtain appropriate 95% credible intervals.
data
contains all data used throughout the project.
downloads
contains the original data before processing.processed
contains the processed data.images
contains any generated images.
scripts
contains all code written throughout the project.
preprocessing.ipynb
contains the code for pre-processing the downloaded data.cross_correlations.R
performs the cross-correlation analysis.temperature_map.R
creates various temperature maps.RStanODEModel.R
andode.stan
performs Bayesian temperature modelling.main.ipynb
main Python script collating the analyses.
All of the data used is freely available.
- Temperature data was accessed from the NASA GISSTEMP v4 dataset, which consists of monthly anomaly estimates on a 2°×2° grid from 1880 to present.
- CO2 data was accessed from the NOAA GML dataset, which consists of monthly atmospheric CO2 concentrations at the Mauna Loa Observatory in Hawaii from 1958 to present.
- Volcanic activity data was accessed from the Global Volcanism Program dataset, which details all recorded eruptions in recent history.
- Solar irradiance data was accessed from the NOAA CDR dataset, which contains yearly averaged solar irradiance values from 1880 to present.
This data is preprocessed before analysis.
The Python programming language (version 3.11) is used for most of the analysis in this project, with the R programming language (version 4.3) and Stan probabilistic programming language (version 2.26.1) being used for certain tasks.
-
Jupyter notebooks are used for all Python code. The easiest way to install Jupyter notebooks is through the Anaconda platform.
-
The standard Python tools for statistical data analysis are used. These include the
pandas
,numpy
andscipy
packages. The preprocessing of the data requires thexarray
andnetCDF4
Python packages. Python packages can be installed usingconda install package_name
if using Python through the Anaconda platform, or usingpip install package_name
otherwise. -
The
ggplot2
,ggmap
,testcorr
,rstan
andHDInterval
packages are required for running the R and Stan scripts. R packages can be installed usinginstall.packages("package_name")
.
The recommended usage is as follows:
- Download the contents of
data
andscripts
. - Install all necessary software and packages.
- Run
preprocessing.ipynb
to prepare the downloaded data for analysis. - Run
cross_correlations.R
to perform the correlation analysis. - Run
RStanODEModel.R
to perform the Bayesian temperature modelling. - Run
temperature_map.R
to generate useful temperature plots. - Run
main.ipynb
to view the analyses and relevant plots together.
Adam Watt
Seán O'Neill