Author
Justin Ferguson GitHub
Justin Ferguson LinkedIn
Contact
Other Versions
HTML Report
The analysis of data on past weather events is a vitally important process to understand, predict, and plan for future events. Of particular importance is to better understand the impact that storm events may have on property, crops, and human life. One such phenomenon unique to storm events is that of the tornado. They are known for their destructive prowess and come in many flavours.
This report will analyse and quantify the impact of storm events in 2019 in the USA by mainly focusing on tornadoes.
The analysis performed in this report was completed in SAS with data sets made available by the National Centers for Environmental Information. Six data sets were extracted and contained the following variables and observations:
Descriptions – contains a narrative about weather event episodes: 2 variables, 10575 observations.
Detail – contains information such as weather event type, state, time, injuries, etc.: 19 variables, 67539 observations.
Damage – contains information about damage caused to property and crop by weather events: 4 variables, 67539 observations.
Fatalities – contains information about fatality type, date, sex, associated weather event, etc.: 7 variables, 577 observations.
Tornadoes – contains information about tornado type, dimensions, associated weather event, etc.: 9 variables, 1728 observations.
Locations – contains information about weather event locations, coordinates, range, etc.: 10 variables, 53215 observations.
View Data Sets, Data Dictionary and Relational Model
The data sets were read into SAS using a DATA step or with a library reference. Data types of selected variables were then defined as either datetime, character, or numeric, where required. Any necessary value recoding or data tidying was carried out with conditional statements and the PROC TRANSPOSE procedure. Tables were joined together with PROC SQL statements to enable the analysis of those variables of interest. Several outputs and plots have been produced to support the analysis throughout this report using either the PROC FREQ or PROC SGPLOT procedures. Other methods used to further the analysis include a statistical t-test at the 99% confidence level with the PROC TTEST procedure.
Refer to figure 1 below. This figure summarises the types of fatality and locations of death in the USA in 2019 as a contingency table. There are two fatality types: the first is associated with direct exposure to the forces of the weather event, the so-called 'direct fatalities'. The other type is due to the indirect presence of the weather event. For example, an elevated stress level due to a weather event led to a heart attack. Hence, the heart attack is the direct cause of death, the weather event is the indirect cause. This type of death is classified as an 'indirect fatality'.
Figure 1: A contingency table showing fatality type by fatality location in 2019.
With this explanation aside, first consider the row totals, these statistics are related to the type of fatality. They show that weather events attributed to 415 direct and 162 indirect fatalities, or 72% and 28%, respectively. Most notably, deaths due to the direct forces of nature were significantly higher than indirect. The overall fatality statistic was 577.
Now consider the column totals in figure 1, these statistics are associated with the location of death. Most intuitively, being inside accounted for the least fatalities, only 8%. As expected, being outside had considerably more, with 29%. Positioning next to bodies of water appeared to be as dangerous with 22%. However, vehicle or equipment related deaths appeared to trump all others with 35%.
Lastly, consider the intersects of figure 1, these statistics provide information about the type of fatality and the location of death. Logically, being outside during a weather event had the most direct fatalities, owning 23% of the total and the most hazardous of all metrics. However, vehicle/equipment and water-based direct fatalities followed closely behind, with 22% and 19%, respectively. Indirect fatalities were their deadliest when linked with vehicle/equipment, accounting for 14% of the total metric and dwarfing all other locations of death.
Figure 2 shows the total direct and indirect injuries per month for weather events in 2019. Visual analysis suggests that weather events pose a greater risk to injury from January to July, but then drops considerably between August and December. The greatest risk of direct or indirect injury occurs in May or February, respectively. The safest months are January and October.
Figure 2: A clustered bar chart showing the total direct and indirect injuries per month for
weather events in 2019.
To better understand the impact of storm events last year, it is appropriate to consider the effect that extreme phenomena may have had on property damage and human life. In this context, a two independent samples t-test has been carried out to test for a statistically significant mean difference in the cost of property damage, between tornado severity types EF0 and EF1, given;
Null Hypothesis: H 0 = 0,
Alternative Hypothesis:
H A ≠ 0,
Sample Size EF0: N EF0 = 644 > 30,
Sample Size EF1: N EF1 = 531 > 30,
Significance level:
α = 0.01.
Consider figures 3 and 4 below showing kernels of each sample against a theoretical normal distribution and Q-Q plots to check the normality assumption.
Figure 3: Distribution curves of the EF0 and EF1 kernels against the normal distribution
with actual data points for both types below.
Figure 4: Q-Q plots show the distributions of tornado types plotted against theoretical
normal quantiles.
Both plots show that the normality of samples has not been satisfied. Also, note that the variance of EF0 with respect to EF1 is much larger. Regardless of these violations of normality, the Central Limit Theorem ensures the t-test can proceed because both sample sizes are greater than 30.
As in figure 5 below, the p-value for the test of homogeneity of variance is p < .0001, so H 0 was rejected at the 99% confidence level and equal variances was not assumed. Using the Satterthwaite (Welch) method for a two samples t-test found a statistically significant mean difference in property damage between tornado severity EF0 and EF1 of -$216,579, where t (df = 536.81) = -4.46, p <.0001 99%[-$342,058 -$91,101].
In simplier terms, the average cost of property damage during a EF1 tornado event is significantly greater than an EF0 tornado event. The average difference in the cost of property damage between these two types is $216,579. This variation quantifies the differences in destructive power.
Figure 5: A summary of the two independent samples t-test of property damage given
tornado type. Also shown is the test for equality of variances.
Figure 6 below shows the frequencies and percentages of different tornado types last year. Overall, there were a total of 1728 tornado occurrences, with 78% being EF0 or EF1. The difference in numbers between these types is only 117.
Figure 6: A frequency distribution table with percentage of different tornado types in 2019.
Now observe figure 7 below, the total damage caused by all tornadoes in 2019 far exceeds any other weather event classification with $3B in total damage. A crude method to gauge the contribution towards this total is to multiply the average cost of a single occurence (taken from figure 5) by the total number of occurences.
Mean EF0 x N EF0: $25,391 x 730 = $18.54M
Mean EF1 x N EF1: $241,970 x 613 = $148.33M
The difference in accumulated damages is eight-fold. However, when compared with the total damage cost shown in figure 7, the combined sum of damages due to EF0 and EF1 tornadoes account for less than 1%. So, one can infer that the cost of damage between the types must rise exponentially. Therefore, EF2 and greater tornadoes must account for the majority of damages. One can assume that a major factor to judge a tornado season's overall severity is predominantly determined by the number of severe tornadoes (of at least type EF2 and greater).
Figure 7: A bar chart showing damage cost to property and crops caused by weather
events in 2019.
Another way to ascertain the effect that storm events had in 2019 is to measure the overall impact on human life. Consider figure 8 below, the impact coefficient has been created to measure the effect of such events and is defined as follows:
Where Iα is the impact coefficient, fα are the total fatalities (direct and indirect), and tα are the total injuries (direct and indirect) of any particular weather event type α. In the denominator, these variables are summed through i weather event types. Figure 8 reveals the overall impact on human life due to tornadoes last year was 0.6. This surpasses all other weather events overwhelmingly, as the remaining fall short of 0.1. Also below, figure 9 reveals an impact coefficient of approximately 0.5 in Alabama, this exceeds all other states. Therefore, one can infer that tornadoes had the highest impact on human life in the state of Alabama.
Figure 8: A bar chart showing impact coefficient given weather event type.
Figure 9: A bar chart showing impact coefficient given state.
Figure 10 below shows fatalities based upon the weather event classification. Rip currents produced the most fatalities and wildfires the least, approximately 60 and 5, respectively. Despite having the highest impact coefficient, tornadoes were not the deadliest.
Figure 10: A bar chart showing fatalities given weather event type.
A similar situation in figure 11, where Alabama did not have the largest count of mortalities, despite yielding the highest impact coefficient. Instead, the highest concentration of fatalities is in California with around 60.
Figure 11: A bar chart showing fatalities given state.
Weather events amassed 577 fatalities last year in the USA, with the majority not inside and due to direct exposure. A visual analysis revealed the months of greatest risk being from January to July. Of particular interest is the effect of storm events, as these categorise some of the most extreme phenomena, such as tornadoes.
Overall, there were a high number of tornadoes last year, with 78% being either EF0 or EF1. The difference in average damage cost for each occurence between these two tornado types was statistically significant being $216,579 with 99% confidence. This quantifies the difference in destructive power. Nevertheless, the combined sum of damages due to EF0 and EF1 tornadoes account for less than 1% of the overall $3B worth of damage in 2019. Inferring that the difference in damages caused by tornadoes between different categories is exponential, as the overwhelming majority of damage was done by types EF2 and greater. Moreover, one should infer that the degree of tornado season severity is predominantly determined by the count of severe tornadoes (of at least type EF2 and greater).
Despite not having the most fatalities with respect to other weather event types, tornadoes clearly had the most significant effect on human life overall. This is prominently shown in terms of impact coefficients and damages caused. Based upon the analysis, Alabama appears to have been the most severely affected.