With the code in this repository you can assign spatial units (blocks) to voting districts. The idea of the process is not to completely automate the endeavour, but create variations and allow the user to chose from those and do the final adjustments manually herself.
You need NodeJS and npm installed on your system. As a data basis for your process you need a GeoJSON with the spatial units. Each unit needs to have the number of inhabitants and its current voting district. A mapbox key for the maps (you can get one for free at mapbox.com).
IMPORTANT: This tool does not do regionalization, as in taking a number of blocks and grouping them into voting districts. This tool instead takes an existing voting district structure and checks if inhabitants are evenly distributed across all voting districts. Nothing the less, if you just want to use the editor, you could just generate some random voting districts and then do the whole work manually.
Install the dependencies...
npm install
Move your GeoJSON to public/assets/data/blocks.geojson. The keys of your properties can be setup in the .env file (see next step).
Edit the .env file:
Key | Description |
---|---|
MAPBOXKEY | The application has a lot of maps, we are using mapbox gl js and mapbox base maps, therefore, you need a mapbox key to get started. |
SERVER | Where are you hosting your server. This important for saving and retrieving custom user variations. If you don't want user's to be able to save their customisations, you need to modify src/components/views/Editor.svelte and remove the save button. |
KEY_POPULATION | All variables with a KEY_ prefix refer to the geojson feature properties key-value-pairs. Population per district. Required value in GeoJSON. |
KEY_DISTRICT | Voting district of block. Required value in GeoJSON. |
KEY_ID | Block id (if blocks have no id, simply generate an integer 0,1,2...n). Required value in GeoJSON. |
KEY_NEIGHBORS | This gets generated by network.ts. List of voting districts in the direct neighborhood of the individual block. |
KEY_NEIGHBOR_BLOCKS | This gets generated by network.ts. List of neighboring blocks. |
SHOW_NETWORK | Should the tab network be shown in the app? |
IGNORE_DISTRICTS | Sometimes there are just odd voting districts in your dataset that mess everything up. You can enter a comma-separated list of voting districts, which will be ignored for the automation part. |
LIMIT | What is the desired population limit for each voting district? |
ALLOW_SAVE | Should users be allowed to save their edits? |
If this is the first time you are setting things up...
..., you need to create the adjacency data, so figuring out what are neighbouring blocks. This is important for creating nice continous voting districts and not end up with holes.
When you have placed your blocks.geojson in the public/assets/data folder run the follwing command:
ts-node tools/network.ts DESTINATION.geojson
If ts-node does not work by installing it as a project dependency, you might need to install it globally. The DESTINATION.geojson is the name of the new file that is being generated.
Next replace the old geojson with the new one.
The automatically generated neighborhood network is a good starting point, but rarely perfect. We strongly recommend you use the app's network tab to refine you geojson. Remove or add connections as you like. Afterwards you can use the export-button, to download the geojson and replace it in your data-folder.
If you want to provide your users with a variety of good variations, use the simulation script to generate some:
ts-node tools/simulation.ts DESTINATION_FOLDER COUNT ITERATIONS BEST_OF
Param | Description |
---|---|
DESTINATION_FOLDER | Where should the newly generated files be stored (folder). |
COUNT | How many variations should be generated (integer, optional, default: 1000). |
ITERATIONS | How many iterations per variation (integer, optional, default: 100). |
BEST_OF | At the end the system selects the best variations depending on some selection criterias. BEST_OF defines how many of the best per criteria (integer, optional, default: 10). |
IMPORTANT: Depending on the number of COUNT and ITERATIONS this can take a while. Just let it run over night, you only need to do this once.
When the simulation is completed you get a folder with all the variations and a best.csv and a best folder, with the geojsons of the best variations. Move both to the public/assets/data folder.
Simply start Rollup:
npm run dev
Navigate to localhost:5000. You should see the app running.
To create an optimised version of the app:
npm run build
After building the public folder holds everything you need. Simply copy it to your destination. You could also serve the whole thing through a service like Netlify and use the public folder as your web root. But you still need to host the server somewhere else. In this case make sure CORS is setup and both are served over HTTPS.
The app is build in Svelte, in Typescript. The two scripts simulation and network as well. When you are applying any changes, be careful, because the app and the scripts share the typescript files in src/libs. So make sure if you do change them, to also update the corresponding files.
If you want to allow people to save their variations and others to access those variations you need a server to store the data. We have build a very simple PHP solution for that. (WHAT? PHP? - YES. PHP is available on almost every server and its easy to find a free hosting service for php. And really its just a few lines of code, if you want a python/nodejs solution, it should not require much more lines of code, happy to accept contributions.).
So simply copy the files in the server-folder to your server destination. Make sure the destination is set in the .env file. Run the setup.php file and remove it afterwards. Make sure that script and folder have the correct rights, so the script can create files and folders.
The script receives the block IDs and their corresponding voting district ids, this is then stored as a JSON file (index.php?action=save). A list of uploaded user variations is also available (index.php?action=list).
IMPORTANT: This is a very simple php app. It does not have sophisticated security, but we did our best: 1. We parse incoming data into integers so any malicious content is being removed. 2. After 500 uploaded variations the service stops working, to stop any DDOS attacks.
IMPORTANT: As part of the sercurity we parse district-id and block-id as integers and then prefix with 0s. This means if your IDs are in a different format you need to modify lines 49 to 52.
Each blocks perimeter is increased through a buffer of 50 meters. Afterwards the system checks where those buffers overlap. Each overlap is considered a neighbor. This also generates neighbors where only corners touch. We have experimented with different buffers and additional filters on intersection size, but could not reach a perfect solution. From a generalizable perspective, this is also extremely different from dataset to dataset, as local specificas need to be taken into account. The network editor in the app is very easy and quick to use, therefore, the automatic output is rather a starting point then a finished result.
We experimented with a lot of different solutions, we tried neural networks, decision trees and a few other machine learning approaches. The problem at hand is not really suited for most machine learning approaches. The organisation of blocks into districts is similar to a multi-knapsack problem or a multi-travelling-salesman problem (fleet/delivery management). Therefore, it falls into the domain of combinatorial optimization. We used Googles OR-Tools and built an INTEGER-SOLVER, to figure out the best solution, but this was also not really efficient. At first sight the problem looks quite simple we have n-districts, n-blocks and a couple of rules we need to watch while organizing things. But looking closer, the problem is, that there are tons and tons of possible combinations. A further problem is, that it is not a linear continous optimization. Given state A, you move some boxes and, thereby, create state B, which is better than state A. This does not guarantee that from the improved state B you can reach the best solution. In some cases our final approach sometimes even creates worse states, before reaching an almost perfect state. At the same time we were hoping to achive something that is easy to use, efficient and transparent.
So the final approach is not particularly smart, but very transparent and easy to use and when used many times (which takes a while, but still faster than the approaches above), deliveres very good results. The approach runs in iterations. In each iteration, for voting districts which population is above the limit, potential blocks that can be moved to another district are being identified, trying to reduce the risk of overpopulating another district, but also introducing randomness. Only one block per voting district per iteration is being moved. In the next iteration it starts again. This is being repeated until all voting districts meet the population limit or a predefined maximum number of iterations is met.
Most of the rankings do not need explanation the only special ranking is the compactness index. A good voting district should be spatially compact, this does not only look pretty, but it makes it for example easier to place locations for voting and it could prevent gerrymendering, where districts become highly distorted in order to meet political agendas. The compactness index is still tricky, because we are also trying to account for built and natural features of the city, like city boundaries, big roads or rivers. Those features can lead to not very ideal districts. So the spatial compactness should only be an additional feature to take into account.
The compactness index is based on a circle. The circle has the perfect ratio of surface and perimeter. Therefore, the index is a districts deviation from this perfect ratio.
- Include the separation of voting regions (easiest solution would be to simply not allow connections in the network between two blocks of different regions).
- Export into other formats than geojson (csv, roads, shapefile)
- The map view is reinitialised on every tab, maybe this could be improved
- Currently several GeoJson-Versions are stored in memory, while we actually only need to store the properties of the changing features. But this would require always re-merging the changes into the main GeoJson. We did not have time for this. But should reduce memory usage of the app.
- Allow users in the simulation screen to select any iteration for further modification. This would increase the complexity of the workflow and interface. No good solution so far.
- Would be nice if one could add/remove voting districts
- Currently the system tries to move blocks so no voting district has too much inhabitants. In addition it would also be nice to reach the most even distribution across all districts (so also no small districts).
- Include natural/build barriers in the network generation. Its nice if districts to not cross big roads or rivers. Therefore it would be nice to include this as a penality in the network generation. Data could be simply acquired through open street map. Then running intersections on the network edges and barriers.
Thanks goes to these wonderful people (emoji key):
Sebastian Meier π» π£ π |
Lisa-Stubert π π» |
Tori Boeck π |
Lucas Vogel π |
This project follows the all-contributors specification. Contributions of any kind welcome!
|
Together with:
|
A project by:
|
Supported by:
|