Replies: 5 comments 5 replies
-
Thanks for this. Improving performance is always important to us. |
Beta Was this translation helpful? Give feedback.
-
Thinking about this, I am not sure it will give useful information. |
Beta Was this translation helpful? Give feedback.
-
I will have to run gprof next to see if I can figure out why this behavior. |
Beta Was this translation helpful? Give feedback.
-
I need to make sure I understand the file.
Is this correct? |
Beta Was this translation helpful? Give feedback.
-
I guess I am asking if you rebuild the test file with a differing number of groups and then access one group. |
Beta Was this translation helpful? Give feedback.
-
I am currently investigating the possibility of utilizing NCZarr for efficiently reading of cloud stored data. Our NetCDF files are structured as quad trees (think map tiles) where each tile's data and attributes are contained in a NetCDF group.
As it turns out, when requesting data from via
ncdump
(or python viaxarray
andnecdf4-python
) it seems like the time it takes to fetch the data for a single group increases with how many groups there are in the whole file. I usedstrace
to check how many network requests are being made depending on how many groups there are in the file with these results:The group layout of the (largest) file is currently like this:
I have also tried the following group layout and that did not seem to make a difference in performance:
My question is then if these results are expected? Is there something I can do with my file structure to avoid all those network requests when trying to access a single group of data?
I should mention that I am using NetCDF v4.9.1-rc2 and AWS SDK v1.10.51.
Beta Was this translation helpful? Give feedback.
All reactions