introduction.tex

%!TEX root=index.tex
\newacronym{merra}{MERRA}{Modern-Era Retrospective Analysis for Research and Applications}
\newacronym{html}{HTML}{Hyper Text Markup Language}
\newacronym{sme}{SME}{Subject Matter Expert}
\newacronym{pdf}{PDF}{Portable Document Format}
\newacronym{noaa}{NOAA}{National Oceanic and Atmospheric Administration}
\newacronym{nasa}{NASA}{National Aeronautics and Space Administration}
\section{Introduction}
Across the globe, the big data movement is changing how meteorologists are storing and analyzing data in response to global events. For example, efforts by the Korean Meteorological Administration are underway to upgrade the ability to predict weather patterns and the severity of weather events across the South Korean peninsula. IBM is engaged in similar work in Rio de Janeiro in preparation for the 2014 summer Olympics, with goals of accurately predicting short term weather \cite{rwe}. While focus in near term weather forecasting is expected to remain strong, there has been growing interest in how climate change affects future weather events. What if the vast repositories of weather and climate data could be collected, stored, and analyzed to produce probabilities of catastrophic events months, not weeks, in advance?\\

With the encouragement of the \gls{nasa}, \textsc{CSC's} ClimatEdge\texttrademark{}\index{ClimatEdge} service was developed by \gls{nps} in 2012 to explore the commercial potential of publicly held climate and weather data. The initial offering focused on forward looking climate reports for commodities markets. The reports were written using a cursory qualitative analysis of the NASA \gls{merra} data with commentary by subject matter experts in climate sciences. The monthly reports included Global Agriculture, Global Energy, Sugar and Soft Commodities, Grain and Oilseeds, and Energy/Natural Gas \cite{climatedgeurl}. Interviews conducted with individuals associated with the original ClimatEdge\index{ClimatEdge} offering described a number of lessons learned and decisions that led to a strategy shift to a quantitative product for the next version [personal communication, 2013]. Potential customers were less interested in commentary on qualitative analysis than originally thought. As part of its move to the Big Data and Analytics incubator, the ClimatEdge\index{ClimatEdge} team made the decision to focus on quantitative analysis utilizing existing CSC commercial sales channels.\\

During the 2012 calendar year, the United States had eleven separate weather events where losses totaled more than one billion dollars each, making it the third highest loss year due to natural catastrophes since 1980 [\textsc{CSC} communication from the \gls{noaa}, 2013]. With premiums increasingly unable to cover losses incurred from extreme weather events, the overall profitability of the insurance industry is at risk. With products such as POINT IN and Exceed, \textsc{CSC} has significant sales inroads with the general insurance industry \cite{point_in} \cite{exceed}. With established channels, the general insurance industry became a prime target for the second version of ClimatEdge\index{ClimatEdge}.\\

In order to understand offerings for general insurance, the business model must be explored. The following equation can be applied to the general insurance industry: 
\begin{equation*}
\texttt{Risk} = \texttt{Impact} \cdot \texttt{Probability}
\end{equation*}
\textsc{CSC} recognized that without a substantial quantitative update, ClimatEdge\index{ClimatEdge} could not address the probability of events occurring, thus the overall risk could not be established. Given the interest of Federal agencies with a wealth of publicly available climate data, such as \gls{noaa} and \gls{nasa}, \textsc{CSC} realized a business opportunity existed. A ClimatEdge\index{ClimatEdge} offering based on quantitative analysis of publicly available data targeted towards minimizing risk for the general insurance industry would be a natural fit, as it would serve the desire of Federal agencies while creating commercial business opportunities for \textsc{CSC}. The next step was to develop a technical approach to retrieving, storing, and analyzing the wealth of available data. 
\subsection{Hypothesis}
Without investing in and developing a scalable solution for big data \index{big data} storage and analytics, ClimatEdge\index{ClimatEdge} offerings will be limited to markets served by either qualitative analysis or quantitative analysis against small well-structured data sets. In order to analyze the larger and more structurally complex climate data sets, a framework of scalable technologies will need to be developed that can address both simple and complex offerings of any size. With the proper framework in place, \textsc{CSC} can offer solutions derived from analytics on data having one or more of the following big data characteristics: velocity, volume, or variety to an ever increasing number of industries.\\
 
Two contrasting ClimatEdge offerings will be explored that illustrate the different requirements necessary to store and analyze the requisite climate data. The first offering is shown to not be representative of a big data problem and is solvable by simple technology. The second offering has all the characteristics of a problem in need of a big solution. A flexible framework is then proposed that illustrates how to store and process larger and more complex data sets, such as those seen in the second offering.
\subsection{Offerings}
The large number of severe weather events in 2012, such as tornados, floods, and other natural catastrophes, caused \$160 billion dollars worth of damage in the United States \cite{stalder}. This is near the total of the previous ten years combined. Throughout that same ten year period, insured losses totaled over \$65 billion. Tornados, while a worldwide phenomenon, occur with greater frequency in the United States due to the meeting of cold dry Canadian air with warm moist air from the Gulf of Mexico. Losses from these severe storms alone have accounted for more than half of all insured catastrophe losses since 1990 \cite{lloyds}. Several presentations at the Extreme Weather Congress in January 2013 focused on flooding as a prime source of damage. In the United States, floods are the most common natural disaster and average about six billion in damages each year. While the National Flood Insurance Program provides limited coverage to homeowners and businesses that qualify, a sizable secondary insurance market exists \cite{hope}. Globally, floods rank only behind earthquakes as the world's costliest natural disasters \cite{li}. The Data Services group, an organization within the Big Data and Analytics incubator, in conjunction with industry analysts and other internal groups has identified several opportunities in which an updated ClimatEdge\index{ClimatEdge} offering can benefit the general insurance industry [personal communication, 2013]:
\begin{itemize}
    \item probability of future tornado occurrence
    \item probability of future hail occurrence \& recent forensics
    \item probability of future global and domestic flood occurrence
\end{itemize}
This paper examines the tornado and flood probability analytics as having contrasting requirements at either end of the attribute scale associated with big data. The analytics necessary for producing a hail offering are an offshoot of the tornado algorithm and the data sets required are also similar in complexity to the tornado sets. A detailed examination of the hail offering would therefore, not yield significant additional insight as compared to tornado and flood offerings.