Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify IOC/COOPS API #125

Merged
merged 15 commits into from
Jun 26, 2024
Merged

Unify IOC/COOPS API #125

merged 15 commits into from
Jun 26, 2024

Conversation

pmav99
Copy link
Member

@pmav99 pmav99 commented Jan 26, 2024

Addresses #99

@pmav99 pmav99 requested review from brey and tomsail January 26, 2024 17:51
@pmav99 pmav99 marked this pull request as draft January 26, 2024 17:52
@pmav99
Copy link
Member Author

pmav99 commented Jan 26, 2024

@SorooshMani-NOAA I pushed on the dev branch. Have a look.
I added the code on ioc.py but implemented the tests on a different file so that it is easier to run them.
Feel free to change anything. Don't worry too much about commit messages, we will rebase before we merge.

@brey & @tomsail if you find the time to checkout the dev branch and try it out, I would appreciate any feedback.

Copy link

codecov bot commented Jan 29, 2024

Codecov Report

Attention: Patch coverage is 91.27182% with 35 lines in your changes missing coverage. Please review.

Project coverage is 89.89%. Comparing base (13ba6be) to head (1495264).
Report is 12 commits behind head on master.

Current head 1495264 differs from pull request most recent head 536ddbc

Please upload reports for the commit 536ddbc to get more accurate results.

Files Patch % Lines
searvey/_coops_api.py 86.75% 13 Missing and 7 partials ⚠️
searvey/_ioc_api.py 93.04% 4 Missing and 4 partials ⚠️
searvey/utils.py 37.50% 5 Missing ⚠️
searvey/coops.py 96.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #125      +/-   ##
==========================================
- Coverage   90.18%   89.89%   -0.29%     
==========================================
  Files          13       16       +3     
  Lines        1049     1366     +317     
  Branches      155      190      +35     
==========================================
+ Hits          946     1228     +282     
- Misses         56       78      +22     
- Partials       47       60      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

searvey/ioc.py Outdated Show resolved Hide resolved
@SorooshMani-NOAA
Copy link
Contributor

COOPS web API uses redirects, so we need to make sure we always follow redirects for COOPS:
https://www.python-httpx.org/compatibility/

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 do you have any suggestions for debugging issues in the multithread/process calls by breakpoint?

@pmav99
Copy link
Member Author

pmav99 commented Jan 31, 2024

From the top of my head, no, not really. Do the functions work when you run them directly?

@SorooshMani-NOAA
Copy link
Contributor

I wanted to attach the debugger and step through the code when testing different inputs. For now I got away with print when I see issues, but I wanted to know if there's any better way of doing it.

@pmav99
Copy link
Member Author

pmav99 commented Feb 1, 2024

The problem with using breakpoint() or pdb.set_trace() is that the breakpoint will get executed as many times as you call the function. I doubt this is what you want.

But, TBH, this question feels a bit strange. Just write and test the functions that retrieve and parse the data without any concurrency. As soon as the retrieve_data and the parse_data functions are correct, you can plug in multifutures (or whatever) to add concurrency and make the whole thing faster. But trying to integrate concurrency from the get go is probably overcomplicating things.

That being said, what can help with debugging multiprocessing/multithreading code is using the logging module, which if configured correctly, can print information about the current thread/process. For example:

import logging
import time
import multifutures as mf

logging.basicConfig(
    level=10,
    style="{",
    format="{levelname:8s}; {process:6d}; {threadName:8s}; {asctime:s}; {name:<15s} {lineno:4d}; {message:s}",
    force=True,
)
logger = logging.getLogger(__name__)

def foo(a):
    time.sleep(0.3)
    logger.info("The value of a=%d", a)

logger.info("Start Multiprocessing")
results = mf.multiprocess(foo, [{"a": i} for i in range(5)])
logger.info("Start Multithreading")
results = mf.multithread(foo, [{"a": i} for i in range(5)])
logger.info("Done")

searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
Copy link
Member Author

@pmav99 pmav99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

If there are going to be so many functions that are shared between COOPS and IOC we should extract them to a separate module and import them from there.

searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
searvey/coops.py Outdated Show resolved Hide resolved
@SorooshMani-NOAA
Copy link
Contributor

... we should extract them ...

I agree, I tried to stay as close to the updated IOC code as possible, that's why I basically copy pasted everything and then started modifying. Some of these can be easily put in utils to be used by all the datasource APIs, but also I don't want to start abstracting too soon.

@pmav99
Copy link
Member Author

pmav99 commented Feb 5, 2024

WRT to the names mapping, I haven't tried this but I was wondering whether it is actually simpler to do something like this:

  1. Run the code we have in order to generate the final dictionaries
  2. Hardcode the final dictionaries in the file.
    Any time a product gets added/removed we add/remove a line from the respective dictionary.

Yes, the code will be longer and repetitive, but we have no code generation, no .update(), no ugly comprehensions, no ** etc. Just declarative code.

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 it's a good idea, I'll do that!

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 I'm trying to decide what to do with defaults for datetimes. The issue is for some products:

  • I should include dates that include more than one month before current time to be able to fetch, e.g. high_low, hourly_height, and monthly_mean.
  • Or I should not include only dates more than 1 month prior to current dates, e.g. one_minute_water_level.

To resolve the first issue I can just create coops specific _resolve_start_date and specify the default to be 60 days. For example as of today (February 6th 2024) for monthly_mean I should specify 2023/12/31 to 2024/02/06 so that I get data. If instead I specify 2024/01/01 it fails!

But specifying 60 days cause problem for one_minute_water_level. The max interval for this product is 4 days, but there's no data before less than 1 month from now (at least for the station I'm testing). So if I specify 2024/01/09 to 2024/02/06, it fails because the chunk from 9th to 13th doesn't have data.

I'm not sure what's best. Should I just leave it to the user and say the default might not work for some products? Or should we add more dictionaries for info about what is the default start date range (relative to now) for each product? I'm not even sure if these are true for all stations! Should I first consult with COOPS?

If it's the latter, I think it makes sense to for now go with "default might not work" and later amend.

Actually another problem is whether I have to raise error if there's no data or just warn the user? Coops gives me an error message when there's no data.

searvey/coops.py Outdated
if "error" in result.result:
msg = json.loads(result.result)["error"]["message"]
logger.error("Encountered an error response!")
raise ValueError(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmav99 should I raise or ignore? What do you think?

Copy link
Member Author

@pmav99 pmav99 Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ignore is the correct approach. When dealing with historical data it's not that uncommon to have big gaps. E.g. a station might have been offline for a few months (broken instrument or whatever). This does not mean that the final dataframe will be empty. other requests/urls might return data. Even if the final dataframe is empty, it is not difficult for the user deal with it.

I guess we could add a raise_if_empty argument to fetch_*() but I don't think it offers that much value.

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 except for the two questions above, I'm done with COOPS API. I know that the metadata API needs some adjustment, but I'm assuming we want to keep that separate from the data API update, right?

@pmav99
Copy link
Member Author

pmav99 commented Feb 7, 2024

WRT the default dates. I have to admit that I haven't really looked into this, but couldn't we have a mapping with the default durations? If the data are not too big we could even return the maximum interval by default. Am I missing something?

@SorooshMani-NOAA
Copy link
Contributor

I agree that we might be able to cover most of cases with dictionaries, but I don't want to add too many dicts, like one for how long should I go back before I start having data (e.g. in case of mean, hi/lo, etc.), or how long can I go before there's no data (old data removed (e.g. one minute heights), and so on. I just don't want to go around adding a dict for each of these limitations, especially for these ones that don't have anything in documentation and I just found by trial and error! For now I think I'll just go with let the user figure out ... let's address this later. This PR is already a lot of changes and dictionaries!

@SorooshMani-NOAA SorooshMani-NOAA linked an issue Feb 14, 2024 that may be closed by this pull request
@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Jun 5, 2024

@pmav99 I blobbed all of the changes for COOPS:

  • adding new API for COOPS
  • refactoring API (and the tests you added there)
  • Adding COOPS tests for the new & refactored API

Note that in this case the tests will fail after first and second commit, since all the coops related tests are updated in the third commit. Since refactoring requires rewriting the tests I cannot combine the 3rd and 1st commits either, it still fails. Unless you prefer I split the 3rd commit and add each part to the 1st and 2nd commit

Please let me know what to do

@SorooshMani-NOAA
Copy link
Contributor

@pmav99 please let me know if I need to change anything else

@SorooshMani-NOAA SorooshMani-NOAA mentioned this pull request Jun 12, 2024
@pmav99 pmav99 force-pushed the dev branch 2 times, most recently from 9357962 to 18b6a92 Compare June 19, 2024 10:17
@pmav99
Copy link
Member Author

pmav99 commented Jun 19, 2024

@SorooshMani-NOAA I disabled the progress bar by default on the new api. let me know what you think about this

@pmav99 pmav99 marked this pull request as ready for review June 19, 2024 10:43
@SorooshMani-NOAA
Copy link
Contributor

@pmav99 looks good to me. Please go ahead and merge if you're happy with everything as well, thank you!

@pmav99 pmav99 merged commit e29e879 into master Jun 26, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COOPS Metadata for All Product Types Adapt COOPS to have an IOC-like API
2 participants