-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement climate projections #929
base: main
Are you sure you want to change the base?
Implement climate projections #929
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ekatef for the great draft!!! :D
So, I think it is a nice draft and also to improve the review some ittle documentation and docstring of the functions may help.
The code seems in interesting shape.
We could think of improving the automatization of the snakemake.
In particular, we could specify that these projections start with name "projection-" or something like that.
Accordingly, we could add to the wildcard_constraints that the wildcard cutout cannot start with "projection-".
And this rule could have:
input: "cutouts/{cutout}"
output: "cutouts/projection-{cutout}"
That should make sure that if we require the workflow to use the projection-{anycutout} may fully automatize everything, but needs some check.
The wildcard constraint. may be something like "^(?!projection-).*"
I really look forward for the full implementation :D
scripts/build_climate_projections.py
Outdated
snapshots = pd.date_range(freq="h", **snakemake.params.snapshots) | ||
season_in_focus = snapshots.month.unique().to_list() | ||
|
||
cmip6_xr = xr.open_dataset(cmip6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to load this dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, the dataset is available via Copernicus and licensed as CC-BY 4.0, being IPCC product 🙏🏽
So, it can be downloaded manually or included into a dedicated databundle or loaded via Copernicus API with a request as simple as
c.retrieve(
'projections-climate-atlas',
{
'format': 'zip',
'origin': 'CMIP6',
'experiment': 'ssp2_4_5',
'domain': 'global',
'period': '2015-2100',
'variable': 'monthly_mean_of_daily_mean_temperature',
},
'download.zip')
scripts/build_climate_projections.py
Outdated
for i in range(0, len(cutout_xr.y)): | ||
for j in range(0, len(cutout_xr.x)): | ||
cutout_xr.temperature[k_time, i, j] = np.add( | ||
cutout_xr.temperature[k_time, i, j], | ||
np.array([dt_xr[i, j].item()] * k_time.sum().item()), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first sight, it feels like this code may take long time to compute.
What are you trying to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to modify a temperature value at each spatial point according to a projection for an increase in the monthly temperature. You are absolutely right that it may be quite time-consuming. Do you have any ideas on possible approaches to increase performance?
# TODO read from the cutout file instead of hardcoding | ||
dx_new = 0.3 | ||
|
||
newlon = np.arange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be good to use np.max and alike but unsure performences increas significantly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the hint! Definitely worth investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have done some profiling, and np.min( )
, np.max( )
have been 5-10% faster. Agree that it's not breakthrough in performance, but it is consistent improvement which doesn't affect code readability. So, I'd opt for it. Thanks for the suggestion! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking, by looking at the atlite code it raised few ideas:
- is it possible to load the cmip file as an atlite cutout? this doesn't mean to use the cutout for atlite but simply leverage on the utility functions it has
- if not, the cutout.py file gives quite some hints to improve the functionalities if the task is actually time consuming. For example, the minimum value may always be the first of the list while the maximum the last one (to verify though)
Link to the cutout.py file : https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314
Thank you so much @davide-f! Great idea with the Absolutely agree on your suggestions regarding the doc-strings, and will play a bit with performance. Thank you so much for your support ;) |
…atef/pypsa-earth into implement_climate_projections
Results for performance testing of Argentina, 1-month cutout1 attemptprofiling results - baseline implementation 7 attemptsprofiling results - baseline implementation 20 attemptsprofiling results - baseline implementation Silk-Road region, 1-year cutout1 attemptprofiling results - baseline implementation 5 attemptsprofiling results - baseline implementation |
Have added a stretch approach, improved docstrings, removed hardcoding in the names of parameters to be projected and revised Snakemake interfaces. From the functional perspective, I think that is good time to stop now and finalise the implementation. In particular:
|
As discussed during the weekly meeting:
|
A possible approach to improve performance may be making calculations on the whole grid instead of modifying the array point-by-point (this part, in particular). Some nice hints on that can be found here. However, that leads to a performance-memory trade-off, for which chunks magic of dask can help, like suggests this approach. |
@davide-f my feeling is that this PR may be ready for the next review round. There is still some technical work to be done: datakit preparation, adding a test and probably performance improvements. But it would be great to check that the general design and interfaces look reasonable. I'd be very grateful for your opinion on that 🙂 |
|
||
cmip6_region = cmip6_xr.sel( | ||
lat=slice( | ||
min(cutout_xr.coords["y"].values) - d_pad, | ||
max(cutout_xr.coords["y"].values) + d_pad, | ||
), | ||
lon=slice( | ||
min(cutout_xr.coords["x"].values) - d_pad, | ||
max(cutout_xr.coords["x"].values) + d_pad, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice! I'm wondering, instead of doing min/max, may it be that the first element of the list is also the lowest?
so like cutout_xr.coords["y"].values[0] is the lowest and cutout_xr.coords["y"].values[-1] is the max value?
if so we could avoid the max/mins in the whole document.
Maybe we could use some utility functions for that.
I'm wondering, but is this cmip6 compatible with the object cutout in atlite?
As it is an xarray maybe?
is we load it as a cutout, we can use the bounds function of the cutout object and this facilitates a lot the whole style.
This is a guess though.
Copying here the link to the bounds function and the atlite file, that can give some ideas for the coding style :)
https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! A really great idea to align with atlite
approaches.
As for the particular bounds calculation, I have investigated this idea, but would say that it doesn't feel like a clear approach to use "magic" index numbers. Agree that it's quite common. But personally, I'm not capable to keep in mind which integer means what, and that is usually quite tricky to deal with the code using such conventions. Not sure it's justifiable but performance gains, as currently performance bottlenecks seem to be in spatial calculations.
What would potentially be interesting to adopt from atlite
is work with data chunks. It may be probably a good idea to investigate it for increasing the performance. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm maybe it would be good to at least create here an auxiliary function that, having as input an xarray, returns the boundaries like (xmin, xmax, ymin, ymax) or alike.
having that function may help clarifying the readability regardless of their implementation; although, I believe the xmin may be the first value of x and the last one the maximum.
Likewise, an auxiliary function that calculate the dx and dy may be useful although trivial: the advantage may be readability and manutentibility.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ekatef ,
The PR is in great shape! I added few stylish comments, but the document is in great shape 🗡️
There is no automatic download here but I guess this is expected for the moment right?
# TODO read from the cutout file instead of hardcoding | ||
dx_new = 0.3 | ||
|
||
newlon = np.arange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking, by looking at the atlite code it raised few ideas:
- is it possible to load the cmip file as an atlite cutout? this doesn't mean to use the cutout for atlite but simply leverage on the utility functions it has
- if not, the cutout.py file gives quite some hints to improve the functionalities if the task is actually time consuming. For example, the minimum value may always be the first of the list while the maximum the last one (to verify though)
Link to the cutout.py file : https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314
scripts/build_climate_projections.py
Outdated
CMIP6 climate projections loaded as xarray | ||
month: float | ||
A month value to be considered further for the morphing procedure | ||
year0: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are year 0 and 1 like the base year and the prediction year?
the name could be more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, revised. Are they more clear now?
scripts/build_climate_projections.py
Outdated
month=k_month, | ||
year0=year0, | ||
year1=year1, | ||
years_window=5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may this be a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It must be, I have forgotten to use it there. Fixed now. Thanks :)
Snakefile
Outdated
@@ -363,6 +365,33 @@ if config["enable"].get("build_cutout", False): | |||
"scripts/build_cutout.py" | |||
|
|||
|
|||
if config["enable"].get("modify_cutout", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could call the option climate_projection_cutout or something like that?
I was considering if adding an option different from build_cutout may be necessary, but I believe so to avoid overwriting existing "original" cutouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree that it would be nice to keep the main workflow as safe as possible from interactions with these additions.
The option name revised. Have I understood you idea correctly? :)
Hello @davide-f! Thanks a lot for the review :D That is fantastic to have your support. Have revised the code, trying to address you comments. Not sure I have get all the points right (sorry for that!), and happy to discuss further in any form which is convenient for you. Also, added a number of checks to Absolutely agree that it would be perfect to apply some approaches from |
Ah, regarding the inputs: they are not addressed yet. My feeling is that it would be great to load to zenodo datasets for the major parameters which may be relevant for energy modeling. If using the simplest shift calculation approach, amount of the data needed is about 3GB for the whole globe. Adding a stretch transformation requires additional 6GB for each parameter (fields of min and max parameter values). The whole wish-list may look like:
Probably, we can start with temperature only, and adjust the approaches along the way. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I think the PR is in great shape :) minor comments.
P.S. there are still a lot of plots here that may be removed in the finalized version.
|
||
cmip6_region = cmip6_xr.sel( | ||
lat=slice( | ||
min(cutout_xr.coords["y"].values) - d_pad, | ||
max(cutout_xr.coords["y"].values) + d_pad, | ||
), | ||
lon=slice( | ||
min(cutout_xr.coords["x"].values) - d_pad, | ||
max(cutout_xr.coords["x"].values) + d_pad, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm maybe it would be good to at least create here an auxiliary function that, having as input an xarray, returns the boundaries like (xmin, xmax, ymin, ymax) or alike.
having that function may help clarifying the readability regardless of their implementation; although, I believe the xmin may be the first value of x and the last one the maximum.
Likewise, an auxiliary function that calculate the dx and dy may be useful although trivial: the advantage may be readability and manutentibility.
What do you think?
Changes proposed in this Pull Request
A preliminary version to add climate projections into cutout.
Current limitations
Checklist
envs/environment.yaml
anddoc/requirements.txt
.config.default.yaml
andconfig.tutorial.yaml
.test/
(note tests are changing the config.tutorial.yaml)doc/configtables/*.csv
and line references are adjusted indoc/configuration.rst
anddoc/tutorial.rst
.doc/release_notes.rst
is amended in the format of previous release notes, including reference to the requested PR.