-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
57 add documentation #71
Conversation
… and test mkdocs documentation.
Note that the example notebook is still a direct copy of the operations example notebook.
…ith the ordering notebook to be added.
There are two example notebooks: with or without storing and reloading the reordered STM before performing timing tests. To be determined which should be used.
Thanks @vanlankveldthijs for the implementation. Following our discussion, the next steps can be:
Attaching here two example notebooks I made for comparison: notebooks.zip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vanlankveldthijs, I also reviewed other changes in you implemented. In general, they are good. Just some minor comments:
- I suggested a change on
.gitignore
; - The build workflows are failing because the unit tests are filling.
- The ruff workflow failed because some linting problem. Sarah have already fixed this in Add a function to enrich STM using data from another dataset #66. You do not need to worry about this.
Co-authored-by: Ou Ku <o.ku@esciencecenter.nl>
… to show the effect on large and small chunks.
…mtools into 57_add_documentation
Consolidated demo notebooks into one. Fixed stm.copy issue (should've been stm.copy()) Changed demo notebook to only test subset operation and to test this on large and small chunks (relative to dataset) to see impact. |
Fixed pytests by checking for existence of time in chunksizes. Please check whether implementation could be nicer (better python). |
Hi @vanlankveldthijs, thanks for the nice work and sorry for taking so long to review. I quite like the new notebooks. I think it is good for merging now. As I have mentioned in the previous comment, the ruff workflow should have been fixed by Sarah in #66. Two warnings in the notebook are documented in #74 and #73. They are not relevant for this PR and can be fixed later |
To resolve issue #57
There are a few remaining issues to resolve.
The pull request has 2 example notebooks: one with and one without storing and reloading the ordered data before performing the timing tests. To be determined which is better. This one should be called
demo_order_stm.ipynb
and the other removed.It seems like stmat.copy does not have the desired result: when I try to run STM operations on the cloned data, there seems to be some parts missing (I think it was the whole .stm part). Solved by loading the whole dataset twice, but that is a bit of an ugly fix.
It seems that when using small chunk sizes, storing to zarr fails. Solved by storing using 1000 size chunks and rechunking after loading. This is a bit ugly.
Possible issue with storing the ordered data to zarr when this needs to overwrite existing data. I think shutil.retree also sometimes gives issues (maybe when the is no directory).
Timing tests using the prerequisite small chunk size (500 in this case) become very slow. Don't know whether we can really fix this though.
Currently running the final timing tests on both notebooks. To be committed when done.