Suggestion for rearranging the paleo pism xarrays before writing to zarr #1

jkingslake · 2021-04-02T15:09:18Z

Hi @talbrecht,
I thought I would get your opinion on rearranging the xarrays slightly before you write them to zarr.

The code is here: https://gist.github.com/jkingslake/f974d22e1ea72b4a6f583581406f81d3

Essentially, it splits the data variables along the id dimension into 4 new dimensions that correspond to the four parameters in the ensemble.

Let me know what you think.

jkingslake · 2021-04-05T11:31:48Z

So, should I upload the ensemble data again in the new form?
Yes that would be great, thanks!

The other thing we might have to be careful about is chunk size. I have read that we should be aimed for chunks that are 100 or a few hundred MBs to make computations on clusters most efficient. I think they are ok as they are, but maybe a little small (in present.zarr they the main datarrays are chunked in ~500kb).

I tried rechunking with xarray.datarray.chunk, but this seemed to fail. My next attempt will be with [the rechunker package] (https://rechunker.readthedocs.io/en/latest/), which is specifically designed for this. But maybe it also makes sense to see if you can put it in to zarr in a better chunk size. I was trying size with the following sizes, maybe you can try rechunking on your HPC before upload: ({'time': 1, 'x': 381, 'y': 381, 'par_esia':4}) using rechunker.

talbrecht · 2021-04-08T00:21:54Z

I rearranged the ensemble data accordingly to the 4 varied parameters and successfully re-chunked the arrays. But somehow the upload of the timeseries failed, see jupyter notebook 69548d1?! (Or see here)

jkingslake · 2021-04-08T13:27:33Z

I wonder if it is related to already having zarrs with this name in the bucket in this location.

Does work if you write to a new zarr directory with
mapper = gcs.get_mapper('gs://ldeo-glaciology/paleo_ensemble/'+mf+'_2.zarr').

jkingslake · 2021-04-08T17:09:45Z

...but I see that the other uploads worked ok and I can now see the data in the new 'unstacked' format, so maybe my suggestion above isn't the issue.

talbrecht · 2021-04-08T20:01:51Z

Right, I would guess so too, but I thought, this repo space would then be wasted. In mode='w' ir should actually overwrite existing folders. And would you say, the chuck size is large enough now? I couldn't find out how to chunk in the to_zarr function?!
Just give it a try!

jkingslake · 2021-06-02T17:58:49Z

Hi Torsten, sorry I lost track of this issue. I think the chunk is OK now. Perhaps a little small, but it currently isn't a big deal I think because the whole dataset isn't very big. We might want to revisit the chunk size if we put up a fuller version of the ensemble.

One small issue that I just realized is that the unstacking procedure we are using causes us to lose the attributes of par_ppq, par_esia, etc.

Saving the attributes of each of them before the unstack stage and then re-writing them once par_ppq, par_esia have become coordinates seems to work: https://gist.github.com/jkingslake/0946ae96f065f8def236867f6895b5c9

jkingslake assigned jkingslake and talbrecht and unassigned jkingslake Apr 2, 2021

jkingslake mentioned this issue Oct 1, 2021

Proposed Recipes for Antarctic ice sheet paleo PISM ensemble pangeo-forge/staged-recipes#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion for rearranging the paleo pism xarrays before writing to zarr #1

Suggestion for rearranging the paleo pism xarrays before writing to zarr #1

jkingslake commented Apr 2, 2021

jkingslake commented Apr 5, 2021

talbrecht commented Apr 8, 2021 •

edited

Loading

jkingslake commented Apr 8, 2021 •

edited

Loading

jkingslake commented Apr 8, 2021

talbrecht commented Apr 8, 2021 •

edited

Loading

jkingslake commented Jun 2, 2021

Suggestion for rearranging the paleo pism xarrays before writing to zarr #1

Suggestion for rearranging the paleo pism xarrays before writing to zarr #1

Comments

jkingslake commented Apr 2, 2021

jkingslake commented Apr 5, 2021

talbrecht commented Apr 8, 2021 • edited Loading

jkingslake commented Apr 8, 2021 • edited Loading

jkingslake commented Apr 8, 2021

talbrecht commented Apr 8, 2021 • edited Loading

jkingslake commented Jun 2, 2021

talbrecht commented Apr 8, 2021 •

edited

Loading

jkingslake commented Apr 8, 2021 •

edited

Loading

talbrecht commented Apr 8, 2021 •

edited

Loading