-
Notifications
You must be signed in to change notification settings - Fork 0
/
appendix.tex
42 lines (40 loc) · 2.38 KB
/
appendix.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
%!TEX root=index.tex
\section{Appendix}
\newfontfamily{\anonymous}[Scale=MatchLowercase]{Anonymous Pro}
\lstset{ %
language=C, % Code langugage
basicstyle=\anonymous,
tabsize=1,
breaklines=true,
breakatwhitespace=false,
showstringspaces=false,
showspaces=false,
showtabs=false
}
Listed below is C code which processes a \gls{merra} data file and stores it within a simple SQLite database \cite{hdf}. Every data type would have a similar parser, typically written in Python for ease of development. However given the complex structure of the \gls{merra} format, combined with the size of the data set, it is reasonable to spend the time to develop it in C to gain execution speed. The entire thirty-three year \gls{merra} archive would take approximately one week to ingest as a one time cost, running in parallel. \\
After the \gls{merra} parser, a simple implementation of MapReduce is provided in Python that demonstrates how a histogram would be calculated based on word occurrences in a text file. This is the canonical first MapReduce program that any developer might produce. It divides the task into both Mapper and Reducer jobs and is designed to run on a single core. More advanced implementations would be run over multiple cores on a distributed cluster.
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/geo_point.h}
\hrule
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/merra_regex.h}
\hrule
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/merra_regex.c}
\hrule
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/sql.h}
\hrule
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/sql.c}
\hrule
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/latlon.c}
\hrule
\lstinputlisting{/Users/christopher/Documents/work/Climatalytics/prototype/source/merra/c/main.c}
\lstset{ %
language=Python, % Code langugage
basicstyle=\anonymous,
tabsize=1,
breaklines=true,
breakatwhitespace=false,
showstringspaces=false,
showspaces=false,
showtabs=false
}
A simple mapreduce implementation for word counting in Python is presented below \cite{keller2}.
\lstinputlisting{pr.py}