The goal of C4 is to perform fast, iterative, contiguity-preserving optimization of many of the compactness objective functions found in the gerrymandering literature.
The implemented algorithms include:
- Isoperimeter Quotient: e.g., Polsby & Popper
- Moments of Inertia: Weaver & Hess
- Largest-Inscribed Circle: Ehrenburg
- Smallest Circumscribing Circle: Reock
- Convex Hull Area or Population, e.g., Hofeller & Grofman
- Various Radii: e.g., Frolov 1975, for a good historical review
- Exchange: Angel et al
- Power Diagrams, e.g., Fryer & Holden
- Path Fraction ("Bizareness"): Chambers & Miller
- Distance assignment: Chen & Rodden
- Split-line: Forrest
To do this, C4 includes three c++ classes: (1) a universe, (2) a region, and (3) cells. In districting parlance, this means states, legislative districts, and Census tracts (or block groups or blocks). The Universe class (mainly) is exposed to python through cython. All plotting and most data management happens in python.
See AlgDesign for greater detail on the code.
To facilitate use and replication, all of the dependences and the built software are included in a docker container, hosted on DockerHub. The container was generated using the included Dockerfile and is about 2 GB. Running from scratch is as simple as:
docker run -v $(pwd)/res/:/C4/res/ -e STATE=pa -e SEED=2 -e METHOD=POWER --rm -it jamessaxon/c4:replication
This means:
- Run the c4 software as in this image (
docker run [...] jamessaxon/c4:replication
) - Mount the local directory called
res/
to/C4/res/
in the container. Results written here will be available when the jobs completes. You must make theres/
directory!! - Environment variables / arguments: simulate districts for the
STATE
of Pennsylvania (USPS codepa
), withSEED
of 2 (any number, but < 1000 will format better), using thePOWER
diagramMETHOD
. You can also specify which maps to draw with theSHADING
variable (-e SHADING=all
is all of them).- The possible methods are
POWER DIST RADII IPQ CIRCLES HULL_P HULL_A INERTIA AXIS SPLIT PATH_FRAC
. Several of these run several methods in sequence. For instanceCIRCLES
runs theexchange
,reock
andehrenburg
methods, andRADII
includesrohrbach
,harm_radius
,mean_radius
, anddyn_radius
. - The SHADING options are:
district
(just colors),target
(ratio to target population),density
(show population centers),scores
(show spatial scores),counties
(overlay county geometries), andall
ornone
. The default isnone
.
- The possible methods are
- Remove (
--rm
) the container when it exits. - Run interactively and allow input (
-it
).
Of course, you can also run interactively with /bin/bash
and just cd
to C4
to use run.py
with all of its arguments.
In that case, skip to Running C4, below.
For large-scale jobs, I run C4 as a Docker container on AWS.
The Dockerfile changes very slightly, with AWS keys and s3
tools (see DockerfileAWS).
The scripts for building the container and launching jobs
are also in this directory (though they won't run, without dependencies or passwords!).
Some components require Armadillo, which in turn requires OpenBlas. Note that if you create a python environment with geopandas, OpenBLAS will come for free. So you may not need to to do this.
- OpenBLAS: First download it, then
make && sudo make install
. (Yes, it's that easy!) - Armadillo:
- Linux:
sudo apt-get install libarmadillo-dev libarmadillo6 libarmadillo6-dbgsym
- Mac download then
cmake . && make && sudo make install
- Linux:
You will also need all of the compiled and python packages listed in the Dockerfile. Anaconda has made this much easier. On relatively modern (few year-old) Macs,
conda create -n c4test python=3.6 geopandas pysal=1.14.4 cython
should give you everything you need.
But if you are missing something, these are required packages:
- Compiled:
libboost-all-dev
libgeos-dev
,libgdal-dev
,python3-gdal
,gdal-bin
- Python:
cython
,matplotlib
,fiona
,pysal
,geopandas
,psycopg2
Anaconda notwithstanding, GEOS and GDAL installs can be finnicky. So this part of the installation is on you, gentle user!
Finally, to build C4, it's just
python setup.py build_ext --inplace
Setting up all of the shapefiles, topologies, and voting records from scratch is pretty involved, but these are all included in the package. To run C4, I suggest checking out run.py
which will show you its options (-h
), or run_iter.sh
where you can find the default settings for any of the methods.
For example, to run power diagrams for Pennsylvania, do:
./run.py -s pa -i power:100000 -t 0.01 -x300 -l0 -c100 --power_restart --print_init -w pa/power/s300 -m power
This means: run Pennsylvania, for up to 100k iterations initialized through the power diagram method, using a tolerance of 1%, a seed of 300. Run this for 100 cycles, and restart after completion using the power restart method. Write to file. The method is power.
This is a little different from most methods, since power diagrams do not use the standard greedy optimizer. More typical is:
./run.py -s pa -m hull_p -t 0.01 -x300 -n10000 --conv_iter 500 -c20 --destrand_min 3 --destrand_max 50 --allow_trades
This means use the hull population method. Run up to 10000 iterations total, and stop after 500 iterations with no improvement. Restart the search 20 times. Do remove "strands" from the regions (larger than 3, but smaller than 50 cells).
Alternatively, you can just accept my defaults, and do
export STATE=pa; export SEED=300; export METHOD=IPQ; ./run_iter.sh
which will run Pennsylvania for seed 300 with the Isoperimeter quotient method.
This is the default script that the Docker container runs.
The possible methods are: POWER
, DIST
, RADII
, IPQ
, CIRCLES
, HULL_P
, HULL_A
, INERTIA
, AXIS
, SPLIT
, PATH_FRAC
.
Several of these options run multiple methods in series.
Four types of files will be written to res/
(your local directory).
Note again that, running with docker, the directory must be mounted to a local file.
- JSON files containing a summary of the simulation will be written to
res/json/[state usps]_[method]_s[seed]_c[cycle].json
. These files contain a summary of the entire run: the tract to district assignment, the spatial parameters of the districts, the partisan voting (if available for that state), race and ethnicity, voter balance (PopulationDeviation
), the method used, and so forth (runjq keys file.json
to see this). These data - CSV files containing simply the tract to district assignment. This is just a two-column assignment: row number (equivalent to county + tract geoid, though perhaps a poor technical choice) and the district. This will be written to
res/[state usps]/[method]/s[seed]/c[cycle]/final.csv
. - GeoJSON files of the districting plans,
final.geojson
, in the same directory as (2). These contain the basic plan and some of the information of (1). They are in in the EPSG 4326 coordinate reference system, and display nicely on GitHub or Gist or in Leaflet etc. - Finally,
final_*.pdf
are static maps of the districting plans. There will be one map for each shading method used.
If you're more interested in the results, just head over to my webspace to play with the outputs in an interactive map: