diff --git a/.gitignore b/.gitignore index c1d3e32..f836a4e 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ -*/__pycache__ \ No newline at end of file +*/__pycache__ +venv-tiger diff --git a/README.md b/README.md index 5666d8f..1e25009 100644 --- a/README.md +++ b/README.md @@ -9,28 +9,45 @@ tables are separate from OpenStreetMap tables and get queried at search time sep The dataset gets updated once per year. Downloading is prone to be slow (can take a full day) and converting them can take hours as well. There's a mirror on https://downloads.opencagedata.com/public/ -Replace '2021' with the current year throughout. +Replace '2024' with the current year throughout. 1. Install the GDAL library and python bindings and the unzip tool ```bash # Ubuntu: - sudo apt-get install python3-gdal python3-pip unzip - pip3 install -r requirements.txt + sudo apt-get install python3 + + python3 -m venv venv-tiger + . venv-tiger/bin/activate + pip install -r requirements.txt ``` - 2. Get the TIGER 2023 data. You will need the EDGES files + 2. Get the TIGER 2034 data. You will need the EDGES files (3,235 zip files, 11GB total). - wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2023/EDGES/ + wget -r ftp://ftp2.census.gov/geo/tiger/TIGER2024/EDGES + + Alternatively + + ```bash + curl 'https://www2.census.gov/geo/tiger/TIGER2024/EDGES/' | grep -o 'tl_[^"]*.zip' | sort -u > filelist.txt + # 3235 filelist.txt + cat filelist.txt | sed -e 's!^!https://www2.census.gov/geo/tiger/TIGER2024/EDGES/!' | xargs -n 1 wget + ``` 3. Convert the data into CSV files. Adjust the file paths in the scripts as needed + ```bash ./convert.sh + cd output-path + ./patch.sh + ``` 4. Maybe: package the created files - tar -czf tiger2023-nominatim-preprocessed.csv.tar.gz tiger + ```bash + tar -czf tiger2024-nominatim-preprocessed.csv.tar.gz *.csv + ``` US Postcodes diff --git a/patch.sh b/patch.sh new file mode 100755 index 0000000..d232bbb --- /dev/null +++ b/patch.sh @@ -0,0 +1,5 @@ +#/bin/bash + +# The 1..45 line is way off +grep -v '45;1;all;Woodfall Rd;Middlesex;MA;02478' 25017.csv > 25017.csv.tmp +mv 25017.csv.tmp 25017.csv \ No newline at end of file diff --git a/requirements.txt b/requirements.txt index 065dd07..d144272 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1 @@ -ogr==0.27.0 \ No newline at end of file +gdal==3.8.4