Skip to content

Commit

Permalink
first time process 2024 data
Browse files Browse the repository at this point in the history
  • Loading branch information
mtmail committed Sep 25, 2024
1 parent 84fcde5 commit 564d829
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 8 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*/__pycache__
*/__pycache__
venv-tiger
29 changes: 23 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,28 +9,45 @@ tables are separate from OpenStreetMap tables and get queried at search time sep
The dataset gets updated once per year. Downloading is prone to be slow (can take a full day) and converting
them can take hours as well. There's a mirror on https://downloads.opencagedata.com/public/

Replace '2021' with the current year throughout.
Replace '2024' with the current year throughout.

1. Install the GDAL library and python bindings and the unzip tool

```bash
# Ubuntu:
sudo apt-get install python3-gdal python3-pip unzip
pip3 install -r requirements.txt
sudo apt-get install python3

python3 -m venv venv-tiger
. venv-tiger/bin/activate
pip install -r requirements.txt
```

2. Get the TIGER 2023 data. You will need the EDGES files
2. Get the TIGER 2034 data. You will need the EDGES files
(3,235 zip files, 11GB total).

wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2023/EDGES/
wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2024/EDGES

Alternatively

```bash
curl 'https://www2.census.gov/geo/tiger/TIGER2024/EDGES/' | grep -o 'tl_[^"]*.zip' | sort -u > filelist.txt
# 3235 filelist.txt
cat filelist.txt | sed -e 's!^!https://www2.census.gov/geo/tiger/TIGER2024/EDGES/!' | xargs -n 1 wget
```

3. Convert the data into CSV files. Adjust the file paths in the scripts as needed

```bash
./convert.sh <input-path> <output-path>
cd output-path
./patch.sh
```

4. Maybe: package the created files

tar -czf tiger2023-nominatim-preprocessed.csv.tar.gz tiger
```bash
tar -czf tiger2024-nominatim-preprocessed.csv.tar.gz *.csv
```


US Postcodes
Expand Down
5 changes: 5 additions & 0 deletions patch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#/bin/bash

# The 1..45 line is way off
grep -v '45;1;all;Woodfall Rd;Middlesex;MA;02478' 25017.csv > 25017.csv.tmp
mv 25017.csv.tmp 25017.csv
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ogr==0.27.0
gdal==3.8.4

0 comments on commit 564d829

Please sign in to comment.