Skip to content

Commit

Permalink
revise
Browse files Browse the repository at this point in the history
  • Loading branch information
frenchy64 committed Aug 23, 2024
1 parent 1c85308 commit f91239c
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 26 deletions.
Binary file modified paper/paper.pdf
Binary file not shown.
50 changes: 24 additions & 26 deletions paper/paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,10 @@
\begin{abstract}
Clojure provides a suite of persistent data structures
implemented by Hickey based on previous work by Bagwell.
In this tutorial, we use a ported implementation of
Hickey's Java implementation to Clojure to learn
how Hash Array Mapped Tries work.
This tutorial teaches
how Hash Array Mapped Tries work
using a Clojure port of
Hickey's Java implementation.
\end{abstract}

% Proposal
Expand All @@ -98,9 +99,8 @@

\section{Introduction}

Hash Array Mapped Tries (HAMT) have rocked the functional programming
world with a fast, immutable and persistent alternative
to a hash map.
Hash Array Mapped Tries (HAMT) provide a key tool for functional programmers
as fast, immutable, and persistent alternatives to hash maps.
First described by Bagwell~\cite{bagwell2001ideal},
they are featured in mainstream functional programming
languages like Clojure and Scala, and have been ported
Expand All @@ -127,15 +127,15 @@ \section{Introduction}

To understand with a hash array mapped trie is,
we first give some definitions.
A \textit{trie} is a way of formatting key/value pairs
A trie is a way of formatting key/value pairs
in a tree, where values are leaves and keys are spread
across the paths to those nodes.
Key prefixes occur on the shallow levels of the tree,
and suffixes occur closer to the leaves.
A \textit{bit trie} assumes the mapping keys are strings
A bit trie assumes the mapping keys are strings
of bits. Each level consumes one or more bits to index
its elements.
An \textit{Array Mapped Trie}
An Array Mapped Trie
maps the bits of array indices as a bit trie.

In this paper, we explore Clojure's persistent hash
Expand All @@ -144,9 +144,8 @@ \section{Introduction}
It was implemented by Hickey~\cite{hickey2008clojure}, extending
Bagwell's original formulation~\cite{bagwell2001ideal}
to be persistent.
Persistent data structures use \textit{structural sharing}
when extending themselves, so Clojure necessarily enforces
hash maps to be \textit{immutable}.
Persistent data structures are extended using structural sharing,
so Clojure necessarily enforces hash maps to be immutable.

%\begin{verbatim}
%- introduce what clojure is
Expand All @@ -168,7 +167,7 @@ \section{Introduction}
\paragraph{Contributions}

\begin{enumerate}
\item We walkthrough the mechanics behind HAMTs.
\item We walk through the mechanics behind HAMTs.
\item We describe the internals of Clojure's persistent HAMT implementation.
\item We present a port of Clojure's HAMT from Java to Clojure
for pedagogical purposes,
Expand Down Expand Up @@ -232,7 +231,7 @@ \section{Walkthrough}
work under different operations.

Firstly, a HAMT represents a search tree
based on the \textit{hash} of its keys.
based on the hash of its keys.
Each key is associated with a value.
Figure \ref{hashes} gives sample 32-bit hashes for six keys,
which we will use only in this section.
Expand All @@ -245,7 +244,7 @@ \section{Walkthrough}
first (root) level, level 0. This corresponds
to the first 5 bits of the hash. The maximum
branching factor is $2^5=32$, but since we
only need one entry, we create a \textit{resizable}
only need one entry, we create a resizable
root node.

A resizable node of current capacity $n$ entries,
Expand Down Expand Up @@ -496,15 +495,15 @@ \section{Walkthrough}
would require copying arrays over length 32, we could
instead once-and-for-all allocate a length 32
array where each member is a subtrie
(without a $\times$ flag)---we call this a \textit{full}
(without a $\times$ flag)---we call this a full
node.

This removes the need to bitmap bits---the
32 bitmap bits now map one-to-one to the subtries.

\paragraph{Hash collision nodes}
If two different keys hash to the same value,
we use a \textit{hash collision node}
we use a hash collision node
to differentiate them. One approach
is to default to a linear search---with
the assumption that the hash function
Expand Down Expand Up @@ -617,9 +616,8 @@ \subsection{Understanding the bit operations}
%TODO example

The return value can then
be used bit \textit{and}ed
with the bitmap to return the value of the
desired bit in the bitmap.
be used, combined with the bitmap using bitwise AND,
to return the value of the desired bit in the bitmap.

\paragraph{Array indexing}

Expand All @@ -633,7 +631,7 @@ \subsection{Understanding the bit operations}
To retrieve the next array index, we count the number of 1's
below the given bit in the bitmap (assuming the given
bit is set to 1).
This number $i$ is the number of nodes \textit{before}
This number $i$ is the number of nodes before
the node of interest---thus indexes $2i$ and $2i+1$
contain the key and value of interest.
To demonstrate this, say we have a bitmap
Expand Down Expand Up @@ -733,8 +731,8 @@ \subsection{Understanding the bit operations}
For example, if bitmap was \texttt{1000}---that
is, isolating the 4th bit---decrementing it results
in \texttt{0111}.
Bit \textit{and}ing \texttt{0111}
with bitmap then isolates the 1st-3rd bits, which
Combining \texttt{0111}
and the bitmap with bitwise AND then isolates the 1st-3rd bits, which
we can then use to count the number of 1's
below \texttt{bit} in \texttt{bitmap}.

Expand Down Expand Up @@ -1054,7 +1052,7 @@ \section{Remark on unsigned bit arithmetic on the JVM}
\label{jvm-bit-remark}

Clojure's implementation of HAMT is implemented on the JVM,
which only has signed 32-bit integers.
where 32-bit integers are signed.
The HAMT implementation, however, treats hashes as
arbitrary strings of 32-bits, so we need to emulate
unsigned arithmetic operations.
Expand Down Expand Up @@ -1092,8 +1090,8 @@ \section{Remark on unsigned bit arithmetic on the JVM}
1000 1101 >>> 1 = 0100 0110 //unsigned
\end{verbatim}
%
We always want \textit{unsigned} bit operations, because no bits
are special in a hash, or in a bitmap.
Unsigned bit operations are necessary because no bits
are special in hashes and bitmaps.

\section{Hashes for examples}
\label{hash-examples}
Expand Down

0 comments on commit f91239c

Please sign in to comment.