Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in Country Database #898

Closed
heikoklein opened this issue Aug 21, 2023 · 10 comments · Fixed by #978
Closed

Errors in Country Database #898

heikoklein opened this issue Aug 21, 2023 · 10 comments · Fixed by #978
Assignees
Labels
invalid This doesn't seem right
Milestone

Comments

@heikoklein
Copy link
Member

The following countries are problematic in the pyaerocom country database:

MK: Macedonia -> North Macedonia
TR: Turkey -> Türkiye

SJ: Svalbard and Jan Mayen should be part of NO

@heikoklein
Copy link
Member Author

The current names seem to be taken from https://github.com/richardpenman/reverse_geocode/tree/master. It should be easier to update country-names/codes in pyaerocom. country-borders should i.e. be defined by one of the shape-files from Natural-Earth (including switchables borders depending on legislation).

@heikoklein
Copy link
Member Author

https://github.com/metno/pyaerocom/blob/main-dev/pyaerocom/data/country_codes.json needs updating, too.

geocode usage:

def check_set_country(self):

@jgriesfeller
Copy link
Member

jgriesfeller commented Aug 22, 2023

I just had a deeper look at reverse_geocode: It does not have a license, so forking is problematic.
It basically works by using a list of known locations and uses the one with the smallest distance to determine the country of the point given. So the intelligence is in that list. Unfortunately we need to assume that that list is copyrighted.

There is geopy which acts as interface to various web services that are slow and might need an account and payment (have a look at https://stackoverflow.com/questions/40634522/determine-country-from-its-geographical-locations-in-python)

Also keep in mind that e.g. MEP has nearly 2500 stations, so this needs to be speedy or cached or...

So replacing reverse_geocode is not as easy as it seems.

I also know about using shape files for this purpose, but I have not found an easy way to use them for this purpose in Python.

Any suggestions?

@jgriesfeller
Copy link
Member

jgriesfeller commented Aug 22, 2023

Shapely (https://shapely.readthedocs.io/en/stable/) seems to have the infrastructure to use shape files to determine the country of a given point and is in conda

Some links:
https://gis.stackexchange.com/questions/282681/filter-a-geopandas-dataframe-for-points-within-a-specific-country
https://gis.stackexchange.com/questions/254869/projecting-google-maps-coordinate-to-lookup-country-in-shapefile

The second seems to give the simpolest answer at first glance

@heikoklein
Copy link
Member Author

For shapefile, I suggest using one of the natural_earth shapefiles, i.e. I'm using for UN-country borders admin_0_countries_deu:
https://scitools.org.uk/cartopy/docs/latest/reference/generated/cartopy.io.shapereader.natural_earth.html

I use shapely or osgeo.ogr for lookups: https://gitlab.met.no/emep/uemep-web/-/blob/master/src/kart/UEmep/GrunnkretsLookup.py (internal link only, but Jan/Alvaro have access) or https://gitlab.met.no/emep/uemep-web/-/blob/master/src/kart/UEmep/GrunnkretsLookupOgr.py

When testing speed a few years ago, shapely is a bit slow on startup and had internal memory allocation problems which were critical for a webservice, but might work well for pyaerocom.

@lewisblake lewisblake added the invalid This doesn't seem right label Nov 15, 2023
@lewisblake
Copy link
Member

Is it envisioned that pyaro will handle this? https://github.com/metno/pyaro-readers/tree/main-dev/src/geocoder_reverse_natural_earth

@heikoklein
Copy link
Member Author

It's the pyaro-readers, not the pyaro interface which come bundled with this implementation. We intend to keep it there unless pyaerocom insists putting it into a separate repo.

@lewisblake
Copy link
Member

This is a very common problem and probably worth moving into a separate repository and using as a separate package. We propose this is moved into a package and put on PyPI and conda. It can then become a dependency of pyaro and pyaerocom, probably others outside the institute will use it, but we need to maintain it for our use anyway. There should not be any licensing issues.

@heikoklein
Copy link
Member Author

There are two problems here: One is location->country lookup, which is handled in #952 . The original problem in this issue here is mapping of ISO2 code to country name, which is technically much simpler, but we might have requirements to use special names for a region or another. This should not be outsourced to an external package.

@heikoklein
Copy link
Member Author

depends on: metno/geocoder_reverse_natural_earth#3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants