diff --git a/chapters/homework/homework1-solution.ipynb b/chapters/homework/homework1-solution.ipynb deleted file mode 100644 index 3ee8d87..0000000 --- a/chapters/homework/homework1-solution.ipynb +++ /dev/null @@ -1,793 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "ea863e05", - "metadata": {}, - "source": [ - "# HW#1: Extreme Rainfall Deficit in Singapore \n", - "\n", - "```{admonition} Objectives\n", - ":class: tip\n", - "\n", - "This homework will help you gain a better understanding in terms of the ways how to:\n", - "* Fit Generalized Extreme Value (GEV) distribution \n", - "* Estimate the return level of extreme rainfall deficit\n", - "\n", - "Happy coding!\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "af48d568", - "metadata": {}, - "source": [ - "```{admonition} Submission Guide\n", - "\n", - "Deadline: **Sunday 11:59 pm, 5th November 2023** \n", - "(Note: Late submissions will not be accepted). \n", - "\n", - "Please upload your solutions to [Canvas](https://canvas.nus.edu.sg/courses/47849/assignments) in a Jupyter Notebook format with the name \"Homework1_StudentID.ipynb\". Make sure to write down your student ID and full name in the cell below. \n", - "\n", - "For any questions, feel free to contact Prof. Xiaogang HE ([hexg@nus.edu.sg](mailto:hexg@nus.edu.sg)), or Zhixiao NIU ([niu.zhixiao@u.nus.edu](mailto:niu.zhixiao@u.nus.edu)).\n", - "\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "c8ae81f2", - "metadata": {}, - "outputs": [], - "source": [ - "## Fill your student ID and full name below.\n", - "\n", - "# Student ID:\n", - "# Full name:" - ] - }, - { - "cell_type": "markdown", - "id": "5abc96cb", - "metadata": {}, - "source": [ - "**Data**:\n", - "You will need to use the historical (1981-2020) daily total rainfall at Singapore's Changi station for this homework. \n", - "\n", - "You can create a DataFrame using Pandas by reading file \"../../assets/data/Changi_daily_rainfall.csv\"." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "bb2bdac2", - "metadata": {}, - "outputs": [], - "source": [ - "import math\n", - "import numpy as np\n", - "import pandas as pd\n", - "import matplotlib.pyplot as plt\n", - "\n", - "from scipy import stats\n", - "from scipy.optimize import fsolve\n", - "from scipy.stats import genextreme as gev" - ] - }, - { - "cell_type": "markdown", - "id": "89af3af8", - "metadata": {}, - "source": [ - "## Q1: Calculate extreme rainfall statisitics\n", - "\n", - "First, divide the data into the following seasons: DJF (Dec-Jan-Feb), MAM (Mar-Apr-May), JJA (Jun-Jul-Aug), and SON (Sep-Oct-Nov) using Pandas' `filtering` methods. Calculate the following statistics of daily rainfall for each season: (i) mean; (ii) variance; (iii) skewness; and (iv) kurtosis. Based on the results, identify the driest season. (Details on the filtering method can be found in the [Pandas tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/pandas-basic.html)). (Hint: The driest season can refer to the season with the lowest mean of daily rainfall) (10 marks)" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "a6aa6f17", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Daily Rainfall Total (mm)
Date
1981-01-010.0
1981-01-020.0
1981-01-030.0
1981-01-040.0
1981-01-050.0
1981-01-060.3
1981-01-0751.4
1981-01-080.0
1981-01-094.4
1981-01-100.0
\n", - "
" - ], - "text/plain": [ - " Daily Rainfall Total (mm)\n", - "Date \n", - "1981-01-01 0.0\n", - "1981-01-02 0.0\n", - "1981-01-03 0.0\n", - "1981-01-04 0.0\n", - "1981-01-05 0.0\n", - "1981-01-06 0.3\n", - "1981-01-07 51.4\n", - "1981-01-08 0.0\n", - "1981-01-09 4.4\n", - "1981-01-10 0.0" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "rain = pd.read_csv('../../assets/data/Changi_daily_rainfall.csv', index_col = 0, header = 0, parse_dates = True)\n", - "display(rain.head(10))" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "70678310", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
MeanVarianceSkewnessKurtosis
DJF7.159723336.3312294.90759533.528282
MAM5.117065141.7395783.98041022.942881
JJA4.575788129.3921055.05230741.294758
SON6.065027197.3422204.93009138.885941
\n", - "
" - ], - "text/plain": [ - " Mean Variance Skewness Kurtosis\n", - "DJF 7.159723 336.331229 4.907595 33.528282\n", - "MAM 5.117065 141.739578 3.980410 22.942881\n", - "JJA 4.575788 129.392105 5.052307 41.294758\n", - "SON 6.065027 197.342220 4.930091 38.885941" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "name": "stdout", - "output_type": "stream", - "text": [ - "The driest season is JJA (Jun-Jul-Aug).\n" - ] - } - ], - "source": [ - "# divide the data into 4 seasons\n", - "MAM = rain.loc[(rain.index.month > 2) & (rain.index.month < 6)]\n", - "JJA = rain.loc[(rain.index.month > 5) & (rain.index.month < 9)]\n", - "SON = rain.loc[(rain.index.month > 8) & (rain.index.month < 12)]\n", - "DJF = rain.loc[(rain.index.month < 3) | (rain.index.month > 11)]\n", - "\n", - "\n", - "stat = pd.DataFrame([[DJF.mean().values[0],DJF.std().values[0]**2,DJF.skew().values[0],DJF.kurtosis().values[0]],\n", - " [MAM.mean().values[0],MAM.std().values[0]**2,MAM.skew().values[0],MAM.kurtosis().values[0]],\n", - " [JJA.mean().values[0],JJA.std().values[0]**2,JJA.skew().values[0],JJA.kurtosis().values[0]],\n", - " [SON.mean().values[0],SON.std().values[0]**2,SON.skew().values[0],SON.kurtosis().values[0]]\n", - " ], \n", - " index = ['DJF', 'MAM', 'JJA', 'SON'], \n", - " columns = ['Mean', 'Variance', 'Skewness', 'Kurtosis']\n", - " )\n", - "\n", - "display(stat)\n", - "print('The driest season is JJA (Jun-Jul-Aug).')" - ] - }, - { - "cell_type": "markdown", - "id": "c03f010a", - "metadata": {}, - "source": [ - "## Q2: Fit the GEV distribution \n", - "\n", - "For the driest season you identified in Q1, find the seasonal maximum rainfall deficit based on the 30-day moving average rainfall deficit (please use centered moving average). This will result in a data set of 40 values, one value for each year. Fit two GEV distributions of seasonal maximum rainfall deficit using data from the first 20 years (1981-2000) and the last 20 years (2001-2020) separately. To do this, estimate the GEV parameters using (i) Maximum Likelihood, and (ii) L-Moments, respectively. (Details on fitting a GEV distribution can be found in the [Scipy tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/scipy-basic.html)). (Hint: The rainfall deficit can be obtained by subtracting the 30-day moving average rainfall from the mean rainfall calculated in Q1) (40 marks)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "a782552a", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
30-day Moving Average Rainfall Deficit (mm)
YearDate
19811981-06-164.222455
1981-06-174.222455
1981-06-184.222455
1981-06-194.222455
1981-06-204.119121
.........
20202020-08-130.709121
2020-08-141.269121
2020-08-151.069121
2020-08-161.129121
2020-08-171.129121
\n", - "

2520 rows × 1 columns

\n", - "
" - ], - "text/plain": [ - " 30-day Moving Average Rainfall Deficit (mm)\n", - "Year Date \n", - "1981 1981-06-16 4.222455\n", - " 1981-06-17 4.222455\n", - " 1981-06-18 4.222455\n", - " 1981-06-19 4.222455\n", - " 1981-06-20 4.119121\n", - "... ...\n", - "2020 2020-08-13 0.709121\n", - " 2020-08-14 1.269121\n", - " 2020-08-15 1.069121\n", - " 2020-08-16 1.129121\n", - " 2020-08-17 1.129121\n", - "\n", - "[2520 rows x 1 columns]" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# add 'Year' column to denote the year of the days\n", - "JJA.insert(0, 'Year', JJA.index.year, allow_duplicates = False)\n", - "\n", - "# calculate the 30-day moving average for each year\n", - "JJA_ma = JJA.groupby(['Year']).rolling(30, center=True).mean()\n", - "\n", - "# calculate the 30-day moving average rainfall\n", - "JJA_ma[JJA_ma['Daily Rainfall Total (mm)'].notnull()]\n", - "JJA_ma.columns = ['30-day Moving Average Rainfall (mm)']\n", - "\n", - "# calculate the 30-day moving average rainfall deficit\n", - "JJA_deficit = JJA['Daily Rainfall Total (mm)'].mean() - JJA_ma[JJA_ma['30-day Moving Average Rainfall (mm)'].notnull()]\n", - "JJA_deficit.columns = ['30-day Moving Average Rainfall Deficit (mm)']\n", - "display(JJA_deficit)" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "eea31fe1", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Annual Maximum Rainfall Deficit (mm)
Year
19814.222455
19822.772455
19831.909121
19843.179121
19853.549121
\n", - "
" - ], - "text/plain": [ - " Annual Maximum Rainfall Deficit (mm)\n", - "Year \n", - "1981 4.222455\n", - "1982 2.772455\n", - "1983 1.909121\n", - "1984 3.179121\n", - "1985 3.549121" - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# calculate the maximum deficit for each year\n", - "JJA_annual_max = pd.DataFrame(JJA_deficit['30-day Moving Average Rainfall Deficit (mm)'].groupby('Year').max())\n", - "JJA_annual_max.columns = ['Annual Maximum Rainfall Deficit (mm)']\n", - "JJA_annual_max.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "30f8d70b", - "metadata": {}, - "outputs": [], - "source": [ - "# fit the distribution using MLE\n", - "cMLE_1, locMLE_1, scaleMLE_1 = gev.fit(JJA_annual_max.head(20), method = \"MLE\")\n", - "cMLE_2, locMLE_2, scaleMLE_2 = gev.fit(JJA_annual_max.tail(20), method = \"MLE\")\n", - "MLEGEV_1 = gev(cMLE_1, loc = locMLE_1, scale = scaleMLE_1) \n", - "MLEGEV_2 = gev(cMLE_2, loc = locMLE_2, scale = scaleMLE_2) " - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "fe736f10", - "metadata": {}, - "outputs": [], - "source": [ - "# calculate L-moments based on samples\n", - "def samlmom3(sample):\n", - " \"\"\"\n", - " samlmom3 returns the first three L-moments of samples\n", - " sample is the 1-d array\n", - " n is the total number of the samples, j is the j_th sample\n", - " \"\"\"\n", - " n = len(sample)\n", - " sample = np.sort(sample.reshape(n))[::-1]\n", - " b0 = np.mean(sample)\n", - " b1 = np.array([(n - j - 1) * sample[j] / n / (n - 1)\n", - " for j in range(n)]).sum()\n", - " b2 = np.array([(n - j - 1) * (n - j - 2) * sample[j] / n / (n - 1) / (n - 2)\n", - " for j in range(n - 1)]).sum()\n", - " lmom1 = b0\n", - " lmom2 = 2 * b1 - b0\n", - " lmom3 = 6 * (b2 - b1) + b0\n", - "\n", - " return lmom1, lmom2, lmom3\n", - "\n", - "def pargev_fsolve(lmom):\n", - " \"\"\"\n", - " pargev_fsolve estimates the parameters of the Generalized Extreme Value \n", - " distribution given the L-moments of samples\n", - " \"\"\"\n", - " lmom_ratios = [lmom[0], lmom[1], lmom[2] / lmom[1]]\n", - " f = lambda x, t: 2 * (1 - 3**(-x)) / (1 - 2**(-x)) - 3 - t\n", - " G = fsolve(f, 0.01, lmom_ratios[2])[0]\n", - " para3 = G\n", - " GAM = math.gamma(1 + G)\n", - " para2 = lmom_ratios[1] * G / (GAM * (1 - 2**-G))\n", - " para1 = lmom_ratios[0] - para2 * (1 - GAM) / G\n", - " return para1, para2, para3" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "861a229a", - "metadata": {}, - "outputs": [], - "source": [ - "# fit the distribution using LMM\n", - "LMM_1 = samlmom3(JJA_annual_max.head(20).values)\n", - "LMM_2 = samlmom3(JJA_annual_max.tail(20).values)\n", - "\n", - "locLMM_1, scaleLMM_1, cLMM_1 = pargev_fsolve(LMM_1)\n", - "locLMM_2, scaleLMM_2, cLMM_2 = pargev_fsolve(LMM_2)\n", - "\n", - "LMMGEV_1 = gev(cLMM_1, loc=locLMM_1, scale=scaleLMM_1) \n", - "LMMGEV_2 = gev(cLMM_2, loc=locLMM_2, scale=scaleLMM_2) " - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "1a883569", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Factors fitted by Maximum Likelihood (1981-2000): shape=0.740, loc=2.428, scale=1.435\n", - "\n", - "Factors fitted by Maximum Likelihood (2001-2020): shape=0.380, loc=2.349, scale=0.954\n", - "\n", - "Factors fitted by L-Moments (1981-2000): shape=0.611, loc=2.338, scale=1.382\n", - "\n", - "Factors fitted by L-Moments (2001-2020): shape=0.442, loc=2.373, scale=0.973\n", - "\n" - ] - } - ], - "source": [ - "print('Factors fitted by Maximum Likelihood (1981-2000): shape={:.3f}, loc={:.3f}, scale={:.3f}\\n'\n", - " .format(cMLE_1, locMLE_1, scaleMLE_1,))\n", - "print('Factors fitted by Maximum Likelihood (2001-2020): shape={:.3f}, loc={:.3f}, scale={:.3f}\\n'\n", - " .format(cMLE_2, locMLE_2, scaleMLE_2))\n", - "print('Factors fitted by L-Moments (1981-2000): shape={:.3f}, loc={:.3f}, scale={:.3f}\\n'\n", - " .format(cLMM_1, locLMM_1, scaleLMM_1))\n", - "print('Factors fitted by L-Moments (2001-2020): shape={:.3f}, loc={:.3f}, scale={:.3f}\\n'\n", - " .format(cLMM_2, locLMM_2, scaleLMM_2))" - ] - }, - { - "cell_type": "markdown", - "id": "7062d933", - "metadata": {}, - "source": [ - "## Q3: Compare differences in GEV distributions\n", - "\n", - "Based on the estimated GEV parameters using L-Moments in Q2, discuss, for the driest season, whether there are statistical differences between the two distributions estimated from the first and last 20 years. (Hint: You can use the KS test to compare the underlying distributions of the two samples) (30 marks)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "e811c398", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "From the 1st KS test, statisic is 0.134, pvalue is 0.817\n", - "From the 2nd KS test, statisic is 0.220, pvalue is 0.250\n" - ] - } - ], - "source": [ - "KS_1 = stats.kstest(JJA_annual_max['Annual Maximum Rainfall Deficit (mm)'].head(20), LMMGEV_2.cdf)\n", - "KS_2 = stats.kstest(JJA_annual_max['Annual Maximum Rainfall Deficit (mm)'].tail(20), LMMGEV_1.cdf)\n", - "print('From the 1st KS test, statisic is {:.3f}, pvalue is {:.3f}'.format(KS_1.statistic, KS_1.pvalue))\n", - "print('From the 2nd KS test, statisic is {:.3f}, pvalue is {:.3f}'.format(KS_2.statistic, KS_2.pvalue))" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "22bfc24a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "The pvalues are greater than 0.05. These two samples are not statistically significantly different from each other. Hence there is no significant change of the maximum rainfall deficit distribution with time shift.\n" - ] - } - ], - "source": [ - "print('The pvalues are greater than 0.05. \\\n", - " These two samples are not statistically significantly different from each other. \\\n", - " Hence there is no significant change of the maximum rainfall deficit distribution with time shift.')" - ] - }, - { - "cell_type": "markdown", - "id": "3ddc466e", - "metadata": {}, - "source": [ - "## Q4: Estimate the return level of the extreme events\n", - "\n", - "Based on the estimated GEV parameters using L-Moments in Q2, estimate the rainfall deficit for events with return periods of 50 years, 100 years, and 200 years. (Hint: You can use the GEV distribution fitted from the data of the last 20 years.) (20 marks)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "e5362d61", - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm)
Return period
504.182137
1004.286263
2004.362494
\n", - "
" - ], - "text/plain": [ - " Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm)\n", - "Return period \n", - "50 4.182137 \n", - "100 4.286263 \n", - "200 4.362494 " - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "T = np.arange(2, 201)\n", - "LFTlmm = LMMGEV_2.ppf(1 - 1.0 / T)\n", - "LFTlmm = pd.DataFrame(LFTlmm, index = T, \n", - " columns = [\"Annual Maximum 30-day Moving Average Rainfall Deficit in JJA (mm)\"])\n", - "LFTlmm.index.name = \"Return period\"\n", - "\n", - "display(LFTlmm.loc[[50, 100, 200]])" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.10" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": true, - "sideBar": true, - "skip_h1_title": false, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": {}, - "toc_section_display": true, - "toc_window_display": false - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/chapters/homework/homework1.ipynb b/chapters/homework/homework1.ipynb index da4ca18..1d8e60e 100644 --- a/chapters/homework/homework1.ipynb +++ b/chapters/homework/homework1.ipynb @@ -25,12 +25,12 @@ "source": [ "```{admonition} Submission Guide\n", "\n", - "Deadline: **Sunday 11:59 pm, 5th November 2023** \n", + "Deadline: **Sunday 11:59 pm, 3th November 2024** \n", "(Note: Late submissions will not be accepted). \n", "\n", - "Please upload your solutions to [Canvas](https://canvas.nus.edu.sg/courses/47849/assignments) in a Jupyter Notebook format with the name \"Homework1_StudentID.ipynb\". Make sure to write down your student ID and full name in the cell below. \n", + "Please upload your solutions to [Canvas](https://canvas.nus.edu.sg/courses/61921/assignments) in a Jupyter Notebook format with the name \"Homework1_StudentID.ipynb\". Make sure to write down your student ID and full name in the cell below. \n", "\n", - "For any questions, feel free to contact Prof. Xiaogang HE ([hexg@nus.edu.sg](mailto:hexg@nus.edu.sg)), or Zhixiao NIU ([niu.zhixiao@u.nus.edu](mailto:niu.zhixiao@u.nus.edu)).\n", + "For any questions, feel free to contact Prof. Xiaogang HE ([hexg@nus.edu.sg](mailto:hexg@nus.edu.sg)), Haoling CHEN ([h.chen@u.nus.edu](mailto:h.chen@u.nus.edu)) or Meilian LI ([limeilian@u.nus.edu](mailto:limeilian@u.nus.edu)).\n", "\n", "```" ] @@ -63,15 +63,19 @@ "id": "89af3af8", "metadata": {}, "source": [ - "## Q1: Calculate extreme rainfall statisitics\n", + "## Q1: Calculate daily rainfall statistics (10 marks)\n", "\n", - "First, divide the data into the following seasons: DJF (Dec-Jan-Feb), MAM (Mar-Apr-May), JJA (Jun-Jul-Aug), and SON (Sep-Oct-Nov) using Pandas' `filtering` methods. Calculate the following statistics of daily rainfall for each season: (i) mean; (ii) variance; (iii) skewness; and (iv) kurtosis. Based on the results, identify the driest season. (Details on the filtering method can be found in the [Pandas tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/pandas-basic.html)). (Hint: The driest season can refer to the season with the lowest mean of daily rainfall) (10 marks)" + "Calculate the following statistics for daily rainfall during DJF (**D**ecember-**J**anuary-**F**ebruary): (i) mean, (ii) variance, (iii) skewness, and (iv) kurtosis.\n", + "\n", + "Hint: \n", + "- You can filter the daily rainfall time series for DJF using Pandas' boolean filtering method. Details on filtering values can be found in the [Pandas tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/pandas-basic.html).\n", + "- DJF spans across two calendar years. Make sure you only include complete DJF seasons. For the period 1891 to 2020, this results in 39 complete DJF seasons, from DJF 1981-1982 to DJF 2019-2020." ] }, { "cell_type": "code", "execution_count": 2, - "id": "a6aa6f17", + "id": "92d51020", "metadata": {}, "outputs": [], "source": [ @@ -84,15 +88,17 @@ "id": "c03f010a", "metadata": {}, "source": [ - "## Q2: Fit the GEV distribution \n", + "## Q2: Fit the GEV distribution (40 marks)\n", + "\n", + "Find the seasonal maximum rainfall deficit for DJF, based on the 30-day centered moving average rainfall deficit. This will result in a data set of 39 values, one value for each year. Fit a GEV distribution to the time series of seasonal maximum rainfall deficits. To do this, estimate the GEV parameters using (i) Maximum Likelihood and (ii) L-Moments, respectively. (Details on fitting a GEV distribution can be found in the [Scipy tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/scipy-basic.html))\n", "\n", - "For the driest season you identified in Q1, find the seasonal maximum rainfall deficit based on the 30-day moving average rainfall deficit (please use centered moving average). This will result in a data set of 40 values, one value for each year. Fit two GEV distributions of seasonal maximum rainfall deficit using data from the first 20 years (1981-2000) and the last 20 years (2001-2020) separately. To do this, estimate the GEV parameters using (i) Maximum Likelihood, and (ii) L-Moments, respectively. (Details on fitting a GEV distribution can be found in the [Scipy tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/scipy-basic.html)). (Hint: The rainfall deficit can be obtained by subtracting the 30-day moving average rainfall from the mean rainfall calculated in Q1) (40 marks)" + "Hint: The rainfall deficit is calculated by subtracting the 30-day moving average rainfall from the mean rainfall calculated in Q1.\n" ] }, { "cell_type": "code", - "execution_count": 3, - "id": "a782552a", + "execution_count": 6, + "id": "1fc8bc93", "metadata": {}, "outputs": [], "source": [ @@ -102,18 +108,18 @@ }, { "cell_type": "markdown", - "id": "7062d933", + "id": "3ddc466e", "metadata": {}, "source": [ - "## Q3: Compare differences in GEV distributions\n", + "## Q3: Estimate the return level of the extreme events (20 marks)\n", "\n", - "Based on the estimated GEV parameters using L-Moments in Q2, discuss, for the driest season, whether there are statistical differences between the two distributions estimated from the first and last 20 years. (Hint: You can use the KS test to compare the underlying distributions of the two samples) (30 marks)" + "Using the GEV parameters estimated with L-Moments in Q2, estimate the rainfall deficit for events with return periods of 50 years, 100 years, and 1000 years." ] }, { "cell_type": "code", - "execution_count": 4, - "id": "e811c398", + "execution_count": 14, + "id": "43c2f39c", "metadata": {}, "outputs": [], "source": [ @@ -123,18 +129,27 @@ }, { "cell_type": "markdown", - "id": "3ddc466e", + "id": "0006273e", "metadata": {}, "source": [ - "## Q4: Estimate the return level of the extreme events\n", + "## Q4: Test the goodness-of-fit (30 marks)\n", + "\n", + "In this task, you will compare how different distributions fit the same dataset and interpret the results using statistical analyses. \n", + "- Repeat the distribution fitting as in Q2, but this time using a [normal distribution](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) and the Maximum Likelihood method. (5 marks)\n", + "- Use the Kolmogorov-Smirnov (KS) test to evaluate the goodness-of-fit for both the normal distribution and the GEV distribution you obtained in Q2. (Details on the KS test can be found in the [Scipy tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/scipy-basic.html)) (10 marks)\n", + "- Based on the KS test results, discuss how well each distribution (Normal and GEV) fits the data. (15 marks)\n", + "\n", + "**Bonus (10 marks):**\n", + "- Plot the CDF (Cumulative Distribution Function) to visually compare the fitted normal distribution, the GEV distribution from Q2, and the empirical distribution derived from the data. Compare the behavior of the two distributions at different return periods. Are the KS statistic results consistent with your observations from the CDF plot?\n", + "\n", "\n", - "Based on the estimated GEV parameters using L-Moments in Q2, estimate the rainfall deficit for events with return periods of 50 years, 100 years, and 200 years. (Hint: You can use the GEV distribution fitted from the data of the last 20 years.) (20 marks)" + "Hint: You can recycle the empirical distribution estimation and CDF plotting code from the [Scipy tutorial](https://xiaoganghe.github.io/python-climate-visuals/chapters/data-analytics/scipy-basic.html)." ] }, { "cell_type": "code", - "execution_count": 5, - "id": "e5362d61", + "execution_count": 16, + "id": "80ee34d7", "metadata": {}, "outputs": [], "source": [ @@ -145,7 +160,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", + "display_name": "base", "language": "python", "name": "python3" }, @@ -159,7 +174,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.6" + "version": "3.11.0" }, "toc": { "base_numbering": 1,