-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debug LS and S2 STAC iteration #50
Conversation
Codecov Report
@@ Coverage Diff @@
## main #50 +/- ##
==========================================
- Coverage 74.03% 73.96% -0.07%
==========================================
Files 34 34
Lines 697 699 +2
==========================================
+ Hits 516 517 +1
- Misses 181 182 +1
Continue to review full report at Codecov.
|
Update: we need to reduce our queries because stac-server has some serious limitations. The problem and the error that we are experiencing (which index.max_result_window controls) is that stac-server is improperly using the elastic search scroll API. This seems like a serious flaw in stac-server's implementation and a significant limitation of using stac-server. Specifically, from the Elastic Search API docs, there is this note:
This error is indeed missing functionality / a bug in stac-server itself and has already been filed with stac-server here: stac-utils/stac-server#111 To work around this until stac-server can resolve the issue is partition the data using a filter on the DateTime (e.g. per day) with https://api.stacspec.org/v1.0.0-beta.4/item-search/#operation/getItemSearch |
I've updated the LS and S2 ingest scripts to use pystac-client - we'll see it's able to iterate over the whole collection in a reasonable amount of time... if not, will try to parallelize in a follow-up PR as we parallelize the POST requests to RGD |
scripts/landsat.py
Outdated
from watch_helpers import post_stac_items_from_server | ||
|
||
host_url = 'https://landsatlook.usgs.gov/stac-server/collections/landsat-c2l1/items' | ||
min_date = datetime(2013, 1, 1) # Arbitrarily chosen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matt-bernstein, any thoughts on a minimum ingest date for Landsat?
I'm waiting to see how long it takes to ingest, so assuming the last 8 years of data doesn't take more than an hour or so to ingest, there shouldn't be much of a limitation on our end
stac-server's response times are proving to be slow. We're seeing ~5 seconds to retrieve a single day's items. If we want to iterate over the last 5 years (arbitrarily chosen but we'll def want more data than that), it would take over 2.5 hours (365 * 5 * 5 / 60 / 60 = 2.535) if done serially just to retrieve the items |
Follow up to #44