diff --git a/doc/source/dev/ci.rst b/doc/source/dev/ci.rst index a7580b5f..a13cd56e 100644 --- a/doc/source/dev/ci.rst +++ b/doc/source/dev/ci.rst @@ -1,15 +1,22 @@ -Continuous integration -====================== +######################## + Continuous integration +######################## icclim continuous integration (CI) aims to assist development by: - - Avoiding introducing bugs in the code base. - - Ensuring all new code follow the same code style. - - Measuring how much icclim code base is tested by automated unit tests. This is known as code coverage. - - Making sure the documentation generation is functioning well. + - Avoiding introducing bugs in the code base. + - Ensuring all new code follow the same code style. + - Measuring how much icclim code base is tested by automated unit + tests. This is known as code coverage. + - Making sure the documentation generation is functioning well. These goals are reached using multiple tools: - - pre-commit CI enforce the code style (Black + flake8 + isort) is followed by - committing changes directly on new pull request and blocking merge if necessary. - The relevant file is `.pre-commit-config.yaml`. - - readthedocs, which serve our documentation is also configured to run the documentation generation on each new pull request. - - github actions are used to run unit tests and report the results in each pull request. + - pre-commit CI enforce the code style (Black + flake8 + isort) is + followed by committing changes directly on new pull request and + blocking merge if necessary. The relevant file is + `.pre-commit-config.yaml`. + + - readthedocs, which serve our documentation is also configured to + run the documentation generation on each new pull request. + + - github actions are used to run unit tests and report the results + in each pull request. diff --git a/doc/source/dev/index.rst b/doc/source/dev/index.rst index 72197de8..d4337468 100644 --- a/doc/source/dev/index.rst +++ b/doc/source/dev/index.rst @@ -1,12 +1,14 @@ -Development -=========== +############# + Development +############# -Here are some guidelines for those who which to contribute to icclim development. +Here are some guidelines for those who which to contribute to icclim +development. .. toctree:: - :maxdepth: 2 - :caption: Contents: + :maxdepth: 2 + :caption: Contents: - release_process - ci - contributing + release_process + ci + contributing diff --git a/doc/source/dev/release_process.rst b/doc/source/dev/release_process.rst index 0a51a7d2..83cab481 100644 --- a/doc/source/dev/release_process.rst +++ b/doc/source/dev/release_process.rst @@ -1,65 +1,88 @@ -Release process -=============== - -Automatic Release -+++++++++++++++++ - -As of icclim 6.6.0, a github action (`.github/workflows/publish-to-pypi.yml`) publishes -icclim to pypi whenever a (github release)[https://github.com/cerfacs-globc/icclim/releases] -is published. -This github action requires a manual approuval. -A dedicated `release` github environment has been created to manage the permission for this +################# + Release process +################# + +******************* + Automatic Release +******************* + +As of icclim 6.6.0, a github action +(`.github/workflows/publish-to-pypi.yml`) publishes icclim to pypi +whenever a (github +release)[https://github.com/cerfacs-globc/icclim/releases] is published. +This github action requires a manual approuval. A dedicated `release` +github environment has been created to manage the permission for this github action. Then an automatic process on conda-forge pick the new release from pypi, -create a pull request on icclim-feedstock and wait for our review and approval to -publish the release to conda-forge. +create a pull request on icclim-feedstock and wait for our review and +approval to publish the release to conda-forge. Hence, the process is as follow: #. Merge everything on icclim master branch -#. Create a (github release)[https://github.com/cerfacs-globc/icclim/releases] +#. Create a (github + release)[https://github.com/cerfacs-globc/icclim/releases] #. Wait for the github action to build the package #. Approve the github action to release to pypi #. Wait for conda-forge to create a PR on icclim-feedstock #. Edit and approve PR on icclim-feedstock -Manual release (outdated) -+++++++++++++++++++++++++ +*************************** + Manual release (outdated) +*************************** The Automatic approach #. Make sure all tests pass. + #. Create and checkout a release branch. + #. Update version number of icclim in ``src/icclim/__init__.py``. + #. Update release notes in ``doc/source/references/release_notes.rst``. + #. Merge release branch to master with a PR. + #. Clean dist directory content. + #. Create wheel file on master and source archive. - .. code-block:: sh + .. code:: sh - python3 -m build + python3 -m build #. Upload to pypi. - .. code-block:: sh + .. code:: sh + + flit publish + +#. Update conda-forge feedstock at + https://github.com/conda-forge/icclim-feedstock + + The recipe `recipe/meta.yml` must be updated: + - Fork the repository in with your own account. + + - Update icclim version number at the top. + + - Update `source.sha256` value with the tar.gz sha256. + You can get the tar.gz hash from `pypi + `_ using `view + hashes` link. + + - Add any new dependency in `requirements`. - flit publish + - Create a pull request with these changes, targeting the main + fork on main branch -#. Update conda-forge feedstock at https://github.com/conda-forge/icclim-feedstock + - Wait for the CI feedback and correct things if needed. - The recipe `recipe/meta.yml` must be updated: - - Fork the repository in with your own account. - - Update icclim version number at the top. - - Update `source.sha256` value with the tar.gz sha256. - You can get the tar.gz hash from `pypi `_ using `view hashes` link. - - Add any new dependency in `requirements`. - - Create a pull request with these changes, targeting the main fork on main branch - - Wait for the CI feedback and correct things if needed. - - Merge the pull request + - Merge the pull request #. Update `icclim github release `_ - - You should add a tag similar to the new version number. - - You should enter a short description of the changes, with a highlight on breaking changes. - - There is no need to fill the assets with anything as the release assets are already on conda-forge and pypi. + - You should add a tag similar to the new version number. + - You should enter a short description of the changes, with a + highlight on breaking changes. + - There is no need to fill the assets with anything as the + release assets are already on conda-forge and pypi. diff --git a/doc/source/explanation/4.2.x_installation.rst b/doc/source/explanation/4.2.x_installation.rst index 4a42a703..87375e3e 100644 --- a/doc/source/explanation/4.2.x_installation.rst +++ b/doc/source/explanation/4.2.x_installation.rst @@ -1,12 +1,15 @@ -Install icclim 4.2 and under (outdated version) -=============================================== +################################################# + Install icclim 4.2 and under (outdated version) +################################################# -For a version before 5.0.0, to run icclim you first need to compile our C library. +For a version before 5.0.0, to run icclim you first need to compile our +C library. -The last version where this was needed is `4.2.20 `_ +The last version where this was needed is `4.2.20 +`_ #. Compile the C code: - - `gcc -fPIC -g -c -Wall ./icclim/libC.c -o ./icclim/libC.o` - - `gcc -shared -o ./icclim/libC.so ./icclim/libC.o` + - `gcc -fPIC -g -c -Wall ./icclim/libC.c -o ./icclim/libC.o` + - `gcc -shared -o ./icclim/libC.so ./icclim/libC.o` #. Install icclim: `python setup.py install --user` diff --git a/doc/source/explanation/climate_indices.rst b/doc/source/explanation/climate_indices.rst index b77fbf63..0b0efb57 100644 --- a/doc/source/explanation/climate_indices.rst +++ b/doc/source/explanation/climate_indices.rst @@ -1,148 +1,164 @@ -What is a climate index -======================= +######################### + What is a climate index +######################### -A climate index is a calculated value that can be used to describe the state and the changes in the climate system. -The climate at a defined place is the average state of the atmosphere over a longer period of, for example, months or years. Changes on climate are much slower than on the weather, that can change strongly day by day. - -Climate indices allow a statistical study of variations of the dependent climatological aspects, such as analysis and comparison of time series, means, extremes and trends. +A climate index is a calculated value that can be used to describe the +state and the changes in the climate system. The climate at a defined +place is the average state of the atmosphere over a longer period of, +for example, months or years. Changes on climate are much slower than on +the weather, that can change strongly day by day. +Climate indices allow a statistical study of variations of the dependent +climatological aspects, such as analysis and comparison of time series, +means, extremes and trends. .. note:: - A good introduction for climate indices is on the website of the - `Integrated Climate Data Center (ICDC) `_ - of the University of Hamburg. -.. seealso:: - - `Climate Variability and Predictability (CLIVAR) `_ - - `Expert Team on Climate Change Detection and Indices (ETCCDI) `_ - - `European Climate Assessment & Dataset (ECA&D) `_ - - `ATBD ECA&D indices `_ - - `Article about percentile-based indices `_ - - `Sample quantiles in statistical packages `_ + A good introduction for climate indices is on the website of the + `Integrated Climate Data Center (ICDC) + `_ of the + University of Hamburg. + + .. seealso:: + + - `Climate Variability and Predictability (CLIVAR) `_ + - `Expert Team on Climate Change Detection and Indices (ETCCDI) `_ + - `European Climate Assessment & Dataset (ECA&D) `_ + - `ATBD ECA&D indices `_ + - `Article about percentile-based indices `_ + - `Sample quantiles in statistical packages `_ -icclim capabilities -=================== +##################### + icclim capabilities +##################### -Currently, the climate indices as defined by -`European Climate Assessment & Dataset `_ based on -air temperature and precipitation variables can be computed with icclim: +Currently, the climate indices as defined by `European Climate +Assessment & Dataset `_ based on air temperature +and precipitation variables can be computed with icclim: - - 11 cold indices (GD4, CFD, FD, HD17, ID, CSDI, TG10p, TN10p, TX10p, TXn, TNn) - - 1 drought indice (CDD) - - 9 heat indices (SU, TR, WSDI, TG90p, TN90p, TX90p, TXx, TNx, CSU) - - 14 rain indices (PRCPTOT, RR1, SDII, CWD, R10mm, R20mm, RX1day, RX5day, R75p, R75pTOT, R95p, R95pTOT, R99p, R99pTOT) - - 4 snow indices (SD, SD1, SD5cm, SD50cm) - - 6 temperature indices (TG, TN, TX, DTR, ETR, vDTR) - - 4 compound indices (CD, CW, WD, WW) + - 11 cold indices (GD4, CFD, FD, HD17, ID, CSDI, TG10p, TN10p, + TX10p, TXn, TNn) + - 1 drought indice (CDD) + - 9 heat indices (SU, TR, WSDI, TG90p, TN90p, TX90p, TXx, TNx, CSU) + - 14 rain indices (PRCPTOT, RR1, SDII, CWD, R10mm, R20mm, RX1day, + RX5day, R75p, R75pTOT, R95p, R95pTOT, R99p, R99pTOT) + - 4 snow indices (SD, SD1, SD5cm, SD50cm) + - 6 temperature indices (TG, TN, TX, DTR, ETR, vDTR) + - 4 compound indices (CD, CW, WD, WW) -Detailed description of each indice is available at https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf. -See table below for a short description of each indices. -Most descriptions are extracted from clix-meta. -Initially icclim was designed for online computing of climate indices through the `climate4impact portal `_. -In spite of existence of other packages able to compute climate indices (`CDO `_, `R package `_ ), -it was decided to develop a new software in Python. -Python language was first of all chosen to interface with `PyWPS `_: Python implementation of Web Processing Service -(WPS) Standard. -Another reason was to interface eventually with the `OpenClimateGIS `_. +Detailed description of each indice is available at +https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf. See +table below for a short description of each indices. Most descriptions +are extracted from clix-meta. Initially icclim was designed for online +computing of climate indices through the `climate4impact portal +`_. In spite of existence of other packages +able to compute climate indices (`CDO +`_, `R package +`_ ), it +was decided to develop a new software in Python. Python language was +first of all chosen to interface with `PyWPS `_: +Python implementation of Web Processing Service (WPS) Standard. Another +reason was to interface eventually with the `OpenClimateGIS +`_. +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| short name | Description | +| short name | Description | +=================+=====================================================================================================================================+ -| TG | Mean of daily mean temperature | +| TG | Mean of daily mean temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TN | Mean of daily minimum temperature | +| TN | Mean of daily minimum temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TX | Mean of daily maximum temperature | +| TX | Mean of daily maximum temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| DTR | Mean Diurnal Temperature Range | +| DTR | Mean Diurnal Temperature Range | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| ETR | Intra-period extreme temperature range | +| ETR | Intra-period extreme temperature range | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| vDTR | Mean day-to-day variation in Diurnal Temperature Range | +| vDTR | Mean day-to-day variation in Diurnal Temperature Range | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| SU | Number of Summer Days (Tmax > 25C) | +| SU | Number of Summer Days (Tmax > 25C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TR | Number of Tropical Nights (Tmin > 20C) | +| TR | Number of Tropical Nights (Tmin > 20C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| WSDI | Warm-spell duration index | +| WSDI | Warm-spell duration index | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TG90p | Percentage of days when Tmean > 90th percentile | +| TG90p | Percentage of days when Tmean > 90th percentile | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TN90p | Percentage of days when Tmin > 90th percentile | +| TN90p | Percentage of days when Tmin > 90th percentile | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TX90p | Percentage of days when Tmax > 90th percentile | +| TX90p | Percentage of days when Tmax > 90th percentile | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TXx | Maximum daily maximum temperature | +| TXx | Maximum daily maximum temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TNx | Maximum daily minimum temperature | +| TNx | Maximum daily minimum temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CSU | Maximum number of consecutive summer days (Tmax >25 C) | +| CSU | Maximum number of consecutive summer days (Tmax >25 C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| GD4 | Growing degree days (sum of Tmean > 4 C) | +| GD4 | Growing degree days (sum of Tmean > 4 C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| FD | Number of Frost Days (Tmin < 0C) | +| FD | Number of Frost Days (Tmin < 0C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CFD | Maximum number of consecutive frost days (Tmin < 0 C) | +| CFD | Maximum number of consecutive frost days (Tmin < 0 C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| HD17 | Heating degree days (sum of Tmean < 17 C) | +| HD17 | Heating degree days (sum of Tmean < 17 C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| ID | Number of sharp Ice Days (Tmax < 0C) | +| ID | Number of sharp Ice Days (Tmax < 0C) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TG10p | Percentage of days when Tmean < 10th percentile | +| TG10p | Percentage of days when Tmean < 10th percentile | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TN10p | Percentage of days when Tmin < 10th percentile | +| TN10p | Percentage of days when Tmin < 10th percentile | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TX10p | Percentage of days when Tmax < 10th percentile | +| TX10p | Percentage of days when Tmax < 10th percentile | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TXn | Minimum daily maximum temperature | +| TXn | Minimum daily maximum temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| TNn | Minimum daily minimum temperature | +| TNn | Minimum daily minimum temperature | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CSDI | Cold-spell duration index | +| CSDI | Cold-spell duration index | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CDD | Maximum consecutive dry days (Precip < 1mm) | +| CDD | Maximum consecutive dry days (Precip < 1mm) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| PRCPTOT | Total precipitation during Wet Days | +| PRCPTOT | Total precipitation during Wet Days | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| RR1 | Number of Wet Days (precip >= 1 mm) | +| RR1 | Number of Wet Days (precip >= 1 mm) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| SDII | Average precipitation during Wet Days (SDII) | +| SDII | Average precipitation during Wet Days (SDII) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CWD | Maximum consecutive wet days (Precip >= 1mm) | +| CWD | Maximum consecutive wet days (Precip >= 1mm) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R10mm | Number of heavy precipitation days (Precip >=10mm) | +| R10mm | Number of heavy precipitation days (Precip >=10mm) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R20mm | Number of very heavy precipitation days (Precip >= 20mm) | +| R20mm | Number of very heavy precipitation days (Precip >= 20mm) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| RX1day | Maximum 1-day precipitation | +| RX1day | Maximum 1-day precipitation | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| RX5day | Maximum 5-day precipitation | +| RX5day | Maximum 5-day precipitation | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R75p | (in discussion) Days with RR > 75th percentile of daily amounts (wet days) | +| R75p | (in discussion) Days with RR > 75th percentile of daily amounts (wet days) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R75pTOT | (in discussion) Precipitation fraction due to very wet days (> 75th percentile) | +| R75pTOT | (in discussion) Precipitation fraction due to very wet days (> 75th percentile) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R95p | (in discussion) Days with RR > 95th percentile of daily amounts (very wet days) | +| R95p | (in discussion) Days with RR > 95th percentile of daily amounts (very wet days) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R95pTOT | (in discussion) Precipitation fraction due to very wet days (> 95th percentile) | +| R95pTOT | (in discussion) Precipitation fraction due to very wet days (> 95th percentile) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R99p | (in discussion) Days with RR > 99th percentile of daily amounts (extremely wet days) | +| R99p | (in discussion) Days with RR > 99th percentile of daily amounts (extremely wet days) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| R99pTOT | (in discussion) Precipitation fraction due to very wet days (> 99th percentile) | +| R99pTOT | (in discussion) Precipitation fraction due to very wet days (> 99th percentile) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| SD | Mean of daily snow depth | +| SD | Mean of daily snow depth | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| SD1 | Snow days (SD >= 1 cm) | +| SD1 | Snow days (SD >= 1 cm) | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| SD5cm | Number of days with snow depth >= 5 cm | +| SD5cm | Number of days with snow depth >= 5 cm | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| SD50cm | Number of days with snow depth >= 50 cm | +| SD50cm | Number of days with snow depth >= 50 cm | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CD | Days with TG < 25th percentile of daily mean temperature and RR <25th percentile of daily precipitation sum | +| CD | Days with TG < 25th percentile of daily mean temperature and RR <25th percentile of daily precipitation sum | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| CW | Days with TG < 25th percentile of daily mean temperature and RR >75th percentile of daily precipitation sum | +| CW | Days with TG < 25th percentile of daily mean temperature and RR >75th percentile of daily precipitation sum | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| WD | Days with TG > 75th percentile of daily mean temperature and RR <25th percentile of daily precipitation sum | +| WD | Days with TG > 75th percentile of daily mean temperature and RR <25th percentile of daily precipitation sum | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ -| WW | Days with TG > 75th percentile of daily mean temperature and RR >75th percentile of daily precipitation sum | +| WW | Days with TG > 75th percentile of daily mean temperature and RR >75th percentile of daily precipitation sum | +-----------------+-------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/source/explanation/index.rst b/doc/source/explanation/index.rst index 3f47802d..9e1b1fd3 100644 --- a/doc/source/explanation/index.rst +++ b/doc/source/explanation/index.rst @@ -1,13 +1,14 @@ -Explanation -=========== +############# + Explanation +############# -In here you will find information related to icclim. -For more technical knowledge, see :ref:`references`. +In here you will find information related to icclim. For more technical +knowledge, see :ref:`references`. .. toctree:: - :maxdepth: 2 - :caption: Contents: + :maxdepth: 2 + :caption: Contents: - climate_indices - xclim_and_icclim - 4.2.x_installation + climate_indices + xclim_and_icclim + 4.2.x_installation diff --git a/doc/source/explanation/xclim_and_icclim.rst b/doc/source/explanation/xclim_and_icclim.rst index 9b4195fa..c64070b5 100644 --- a/doc/source/explanation/xclim_and_icclim.rst +++ b/doc/source/explanation/xclim_and_icclim.rst @@ -1,69 +1,85 @@ .. _clix-meta: https://github.com/clix-meta/clix-meta + .. _xclim: https://github.com/Ouranosinc/xclim -Disambiguation on xclim and icclim -================================== +#################################### + Disambiguation on xclim and icclim +#################################### -At first glance it seems `xclim`_ and icclim serve the same purpose but their differences are in the details. -With version 5 of icclim, xclim became a building block of icclim. xclim handles some of the core features, notably -the calculation of climate indices. icclim also make use of xclim capabilities to handle i/o units and to check -input data and metadata validity. +At first glance it seems xclim_ and icclim serve the same purpose but +their differences are in the details. With version 5 of icclim, xclim +became a building block of icclim. xclim handles some of the core +features, notably the calculation of climate indices. icclim also make +use of xclim capabilities to handle i/o units and to check input data +and metadata validity. -While developing icclim v5, we tried to integrate all relevant features directly in xclim. -This is for example the case for the bootstrapping of percentiles, which is a key feature of icclim and -its development was a collaborative work from both parties. -This way, xclim gains new features and users and icclim can rely on all the efforts that was already put into xclim. -We also benefits from the knowledge of xclim developers thanks to code reviews and discussions on their repository. +While developing icclim v5, we tried to integrate all relevant features +directly in xclim. This is for example the case for the bootstrapping of +percentiles, which is a key feature of icclim and its development was a +collaborative work from both parties. This way, xclim gains new features +and users and icclim can rely on all the efforts that was already put +into xclim. We also benefits from the knowledge of xclim developers +thanks to code reviews and discussions on their repository. -This joint effort helped tremendously to create what icclim is today and we believe it strengthens the collaborative -ecosystem of geo sciences. +This joint effort helped tremendously to create what icclim is today and +we believe it strengthens the collaborative ecosystem of geo sciences. --------- +---- -On the other side, we had to clarify what would be the purpose of icclim without its internal climate index computation -feature. -Thus, icclim is now a library which wraps index calculation in a familiar API while **pre-processing** inputs and -decorates output with **metadata**. +On the other side, we had to clarify what would be the purpose of icclim +without its internal climate index computation feature. Thus, icclim is +now a library which wraps index calculation in a familiar API while +**pre-processing** inputs and decorates output with **metadata**. Pre-processing includes: -- Handling of multiple input formats (netcdf files, xarray.Dataset, text file (TBD)) -- Simple automated data chunking -- Variable detection in input (based on the work done in `clix-meta`_) -- Wet day filtering for precipitation indices -- Simple unit handling (rely on xclim pint registry) -- Time sub-setting of data -- Custom sampling frequency of output -- Index configuration, such as threshold values or output unit +- Handling of multiple input formats (netcdf files, xarray.Dataset, + text file (TBD)) +- Simple automated data chunking +- Variable detection in input (based on the work done in clix-meta_) +- Wet day filtering for precipitation indices +- Simple unit handling (rely on xclim pint registry) +- Time sub-setting of data +- Custom sampling frequency of output +- Index configuration, such as threshold values or output unit Output metadata includes: -- History -- Title -- Units -- time_bounds -- provenance (TBD) -- ... +- History +- Title +- Units +- time_bounds +- provenance (TBD) +- ... -Beside, icclim v5 still provides a way to write **user defined indices** using a simili JSON data structure -(actually a python dictionary) with ``icclim.index(user_index=...)``. -The output metadata is however not as rich as with ECA&D indices. +Beside, icclim v5 still provides a way to write **user defined indices** +using a simili JSON data structure (actually a python dictionary) with +``icclim.index(user_index=...)``. The output metadata is however not as +rich as with ECA&D indices. -One of the goal of icclim is also to provide an API which require zero knowledge of xarray letting user new to python -and its ecosystem reliably compute climate indices. -This is one of the reason why icclim exposes a single entry point for all it's features with ``icclim.index``. -It means users new to xarray might prefer icclim while experts could find xclim more convenient. +One of the goal of icclim is also to provide an API which require zero +knowledge of xarray letting user new to python and its ecosystem +reliably compute climate indices. This is one of the reason why icclim +exposes a single entry point for all it's features with +``icclim.index``. It means users new to xarray might prefer icclim while +experts could find xclim more convenient. -xclim' scope is larger than the one of icclim. xclim add metadata to the computed index in multiple languages (currently -FR and EN), it computes many indices not part of the ECA&D specification, it provides biais adjustment and downscaling -algorithms and is capable of detecting many common errors which can be found in netCDF files through health checks. -All those features make `xclim`_ a very good tool by itself and we encourage the reader to check `xclim documentation -`_ for more information on it's features. +xclim' scope is larger than the one of icclim. xclim add metadata to the +computed index in multiple languages (currently FR and EN), it computes +many indices not part of the ECA&D specification, it provides biais +adjustment and downscaling algorithms and is capable of detecting many +common errors which can be found in netCDF files through health checks. +All those features make xclim_ a very good tool by itself and we +encourage the reader to check `xclim documentation +`_ for more +information on it's features. .. note:: - xclim provides a virtual module named ``icclim`` which exposes in a xclim style the ECA&D indices that were - historically provided by icclim. But this module **does not** use our library icclim directly, otherwise we would - have a weird circular dependency. -We are very grateful for the work done on xclim and we hope to continue the collaboration while both libraries grow in -users and maturity. + xclim provides a virtual module named ``icclim`` which exposes in a + xclim style the ECA&D indices that were historically provided by + icclim. But this module **does not** use our library icclim directly, + otherwise we would have a weird circular dependency. + +We are very grateful for the work done on xclim and we hope to continue +the collaboration while both libraries grow in users and maturity. diff --git a/doc/source/how_to/dask.rst b/doc/source/how_to/dask.rst index e07bcc36..0bb98ef2 100644 --- a/doc/source/how_to/dask.rst +++ b/doc/source/how_to/dask.rst @@ -1,479 +1,729 @@ .. _dask: -Chunk data and parallelize computation -====================================== - -TL;DR ------ -icclim make use of dask to chunk and parallelize computations. -You can configure dask to limit its memory footprint and CPU usage by instantiating a distributed Cluster and by tuning -dask.config options. -A configuration working well for small to medium dataset and simple climate indices could be: - ->>> import dask ->>> from distributed import Client ->>> client = Client(memory_limit="10GB", n_workers=1, threads_per_worker=8) ->>> dask.config.set({"array.slicing.split_large_chunks": False}) ->>> dask.config.set({"array.chunk-size": "100 MB"}) ->>> icclim.index(in_files="data.nc", indice_name="SU", out_file="output.nc") - ------------------------------------------------------------------------------------------------- - -icclim uses xarray to manipulate data and xarray provides multiple backends to handle in-memory data. -By default, xarray uses numpy, this is overwritten in icclim to use dask whenever a path to a file is provided as input. -Numpy is fast and reliable for small dataset but may exhaust the memory on large dataset. -The other backend possibility of xarray is dask. dask can divide data into small chunk to minimize the memory footprint -of computations. This chunking also enable parallel computation by running the calculation on each chunk concurrently. -This parallelization can speed up the computation but because each parallel thread will need multiple chunks to be -in-memory at once, it can bring back memory issues that chunking was meant to avoid. - -In this document we first explain some concepts around dask, parallelization and performances, then we propose multiple -dask configurations to be used with icclim. -You will also find a few dask pitfall dodging instructions which might help understanding why your computation doesn't run. -Each configuration aims to answer a specific scenario. For you own data, you will likely need to customize these -configurations to your needs. - -The art of chunking -------------------- -Dask proposes a way to divide large dataset into multiple smaller chunks that fit in memory. -This process is known as chunking and, with icclim there are 2 ways to control it. -First, you can open your dataset with xarray and do your own chunking: - ->>> import xarray ->>> import icclim ->>> ds = xarray.open_dataset("data.nc") ->>> ds = ds.chunk({"time": 10, "lat": 20, "lon": 20}) +######################################## + Chunk data and parallelize computation +######################################## + +******* + TL;DR +******* + +icclim make use of dask to chunk and parallelize computations. You can +configure dask to limit its memory footprint and CPU usage by +instantiating a distributed Cluster and by tuning dask.config options. A +configuration working well for small to medium dataset and simple +climate indices could be: + +.. code:: python + + import dask + from distributed import Client + + client = Client(memory_limit="10GB", n_workers=1, threads_per_worker=8) + dask.config.set({"array.slicing.split_large_chunks": False}) + dask.config.set({"array.chunk-size": "100 MB"}) + icclim.index(in_files="data.nc", indice_name="SU", out_file="output.nc") + +---- + +| icclim uses xarray to manipulate data and xarray provides multiple + backends to handle in-memory data. +| By default, xarray uses numpy, this is overwritten in icclim to use + dask whenever a path to a file is provided as input. +| Numpy is fast and reliable for small dataset but may exhaust the + memory on large dataset. +| The other backend possibility of xarray is dask. dask can divide data + into small chunk to minimize the memory footprint of computations. + This chunking also enable parallel computation by running the + calculation on each chunk concurrently. +| This parallelization can speed up the computation but because each + parallel thread will need multiple chunks to be in-memory at once, it + can bring memory issues that chunking was meant to avoid. + +| In this document we first explain some concepts around dask, + parallelization and performances, then we propose multiple dask + configurations to be used with icclim. +| You will also find a few dask pitfall dodging instructions which + might help understanding why your computation doesn't run. +| Each configuration aims to answer a specific scenario. For you own + data, you will likely need to customize these configurations to your + needs. + +********************* + The art of chunking +********************* + +| Dask proposes a way to divide large dataset into multiple smaller + chunks that fit in memory. +| This process is known as chunking and, with icclim there are 2 ways + to control it. +| First, you can open your dataset with xarray and do your own + chunking: + +.. code:: python + + import xarray + import icclim + + ds = xarray.open_dataset("data.nc") + ds = ds.chunk({"time": 10, "lat": 20, "lon": 20}) And then use ``ds`` dataset object as input for icclim. ->>> icclim.index(in_files=ds, indice_name="SU", out_file="output.nc") +.. code:: python -In that case, icclim will not re-chunk ``ds`` data. It is left to you to find the proper chunking. -For more information on how to properly chunk see: + icclim.index(in_files=ds, indice_name="SU", out_file="output.nc") -* xarray guide: https://xarray.pydata.org/en/stable/user-guide/dask.html#chunking-and-performance -* dask best practices: https://docs.dask.org/en/stable/array-best-practices.html +In that case, icclim will not re-chunk ``ds`` data. It is left to you to +find the proper chunking. For more information on how to properly chunk +see: -Another option is to leave dask find the best chunking for each dimension. -This is the recommended way and the default behavior of icclim when using file a path in `in_files` parameter. -In this case, chunking can still be controlled by limiting the size of each individual chunk: +- xarray guide: + https://xarray.pydata.org/en/stable/user-guide/dask.html#chunking-and-performance +- dask best practices: + https://docs.dask.org/en/stable/array-best-practices.html ->>> import dask ->>> dask.config.set({"array.chunk-size": "50 MB"}) ->>> icclim.su(in_files="data.nc") +Another option is to leave dask find the best chunking for each +dimension. This is the recommended way and the default behavior of +icclim when using file a path in `in_files` parameter. In this case, +chunking can still be controlled by limiting the size of each individual +chunk: -By default, the dask chunk-size is around 100MB. -You can also use ``with`` python keyword if you don't want this configuration to spread globally. -Internally, icclim will ask xarray and dask to chunk using ``"auto"`` configuration. -This usually results in a pretty good chunking. It chunks by respecting as much as possible how the data -is stored and will make chunks as large as approximately the configured chunk-size. -Moreover, some operations in icclim/xclim need to re-chunk intermediary results. We usually try to keep the chunk sizes -to their initial value but re-chunking is costly and we sometimes prefer to generate larger chunks to try improving -performances. -If you wish to avoid this large chunking behavior, you can try the following dask configuration: +.. code:: python ->>> dask.config.set({"array.slicing.split_large_chunks": True}) + import dask -Create an optimized chunking on disk -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Sometimes, you have to work with data that were originally chunked and stored in a way that is suboptimal. -Often climat data are stored in a one year per file format thus the natural chunking of the dataset will be one year per chunk. -The most efficient way to read data on disk would be to chunk data in memory the same way it is distributed on disk. -Here it means having one chunk per file, as long as a file size does not exceed the `array.chunk-size` configuration. - -This scattering of the time axis in many chunks can limit the computation performances of some indices. -Indeed, indices such as percentile based indices require a specific chunking schema to be computed. -This means we must either rechunk in memory to have an optimized chunking. However, this generates many dask tasks and can overload dask scheduler. - -To tackle this issue, icclim 5.1.0 comes with a new feature to first rewrite the data on disk before starting any computation. -We rely on the `rechunker library `_ to make this possible. -The feature is ``icclim.create_optimized_zarr_store``. It is a context manager which allow you to rewrite an input data into a zarr store. -Zarr stores are a modern way to store files on disk with optimized reading and writting in mind. -In our case, it allows to rewrite files with a specific chunking schema, optimized for climat indices computation. - -Now, depending on the climat index you want to compute, the optimal chunking schema might differ. - -For most indices, if you consider chunking on time dimension, you should never chunk below the target resampling period. -For example, with ``slice_mode="month"``, ideally each chunk should include whole month and never chunk in the middle of a month. -But month being of variable lengths, it might actually be much easier to have one chunk per year. -Leap years would add another difficulty to this. + dask.config.set({"array.chunk-size": "50 MB"}) + icclim.su(in_files="data.nc") -However, on indice where a bootstrapping of percentile is necessary (e.g Tg90p), it is actually optimal to -have no chunk at all on time dimension. This is true only because the bootstrapping algorithm rely on `map_block `_. -In that case, you can use ``icclim.create_optimized_zarr_store`` to first create a zarr store not chunked at all on time dimension: - -.. code-block:: python - - import icclim - - ref_period = [datetime.datetime(1980, 1, 1), datetime.datetime(2009, 12, 31)] - with icclim.create_optimized_zarr_store( - in_files="netcdf_files/tas.nc", - var_names="tas", - target_zarr_store_name="opti.zarr", - keep_target_store=False, - chunking={"time": -1, "lat": "auto", "lon": "auto"}, - ) as opti_tas: - icclim.index( - index_name="TG90p", - in_files=opti_tas, - slice_mode="YS", - base_period_time_range=ref_period, - out_file="netcdf_files/output/tg90p.nc", - ) - -Actually this `chunking={"time": -1, "lat":"auto", "lon":"auto" }`, which avoid chunking on time is the default behavior of the function. -`chunking` parameter could be omitted in the above example. +By default, the dask chunk-size is around 100MB. You can also use +``with`` python keyword if you don't want this configuration to spread +globally. Internally, icclim will ask xarray and dask to chunk using +``"auto"`` configuration. This usually results in a pretty good +chunking. It chunks by respecting as much as possible how the data is +stored and will make chunks as large as approximately the configured +chunk-size. Moreover, some operations in icclim/xclim need to re-chunk +intermediary results. We usually try to keep the chunk sizes to their +initial value but re-chunking is costly and we sometimes prefer to +generate larger chunks to try improving performances. If you wish to +avoid this large chunking behavior, you can try the following dask +configuration: -You can also control if you want to keep the optimized zarr store on disk by turning ``keep_target_store`` to True. -This can be useful if you wish to compute other indices using the same chunking. +.. code:: python -On performances ---------------- -Computation of ECA&D indices can largely be done in parallel on spatial dimensions. -Indeed, the ECA&D indices available in icclim are all computed on each individual pixel independently. -In a ideal world it means we could compute each pixel concurrently. -In reality this would result in considerable efforts necessary to chunk data that much, this would be sub-optimal -because the smaller chunk are, the greater dask overhead is. + dask.config.set({"array.slicing.split_large_chunks": True}) -.. note:: +Create an optimized chunking on disk +==================================== + +Sometimes, you have to work with data that were originally chunked and +stored in a way that is suboptimal. Often climat data are stored in a +one year per file format thus the natural chunking of the dataset will +be one year per chunk. The most efficient way to read data on disk would +be to chunk data in memory the same way it is distributed on disk. Here +it means having one chunk per file, as long as a file size does not +exceed the `array.chunk-size` configuration. + +This scattering of the time axis in many chunks can limit the +computation performances of some indices. Indeed, indices such as +percentile based indices require a specific chunking schema to be +computed. This means we must either rechunk in memory to have an +optimized chunking. However, this generates many dask tasks and can +overload dask scheduler. + +To tackle this issue, icclim 5.1.0 comes with a new feature to first +rewrite the data on disk before starting any computation. We rely on the +`rechunker library +`_ to make this +possible. The feature is ``icclim.create_optimized_zarr_store``. It is a +context manager which allow you to rewrite an input data into a zarr +store. Zarr stores are a modern way to store files on disk with +optimized reading and writting in mind. In our case, it allows to +rewrite files with a specific chunking schema, optimized for climat +indices computation. + +Now, depending on the climat index you want to compute, the optimal +chunking schema might differ. + +For most indices, if you consider chunking on time dimension, you should +never chunk below the target resampling period. For example, with +``slice_mode="month"``, ideally each chunk should include whole month +and never chunk in the middle of a month. But month being of variable +lengths, it might actually be much easier to have one chunk per year. +Leap years would add another difficulty to this. - By overhead, we mean here the necessary python code running to move around and handle each independent chunk. +However, on indice where a bootstrapping of percentile is necessary (e.g +Tg90p), it is actually optimal to have no chunk at all on time +dimension. This is true only because the bootstrapping algorithm rely on +`map_block +`_. +In that case, you can use ``icclim.create_optimized_zarr_store`` to +first create a zarr store not chunked at all on time dimension: + +.. code:: python + + import icclim + + ref_period = [datetime.datetime(1980, 1, 1), datetime.datetime(2009, 12, 31)] + with icclim.create_optimized_zarr_store( + in_files="netcdf_files/tas.nc", + var_names="tas", + target_zarr_store_name="opti.zarr", + keep_target_store=False, + chunking={"time": -1, "lat": "auto", "lon": "auto"}, + ) as opti_tas: + icclim.index( + index_name="TG90p", + in_files=opti_tas, + slice_mode="YS", + base_period_time_range=ref_period, + out_file="netcdf_files/output/tg90p.nc", + ) + +Actually this `chunking={"time": -1, "lat":"auto", "lon":"auto" }`, +which avoid chunking on time is the default behavior of the function. +`chunking` parameter could be omitted in the above example. -Another important aspect of dask to consider for performances is the generated task graph. -Dask creates a graph of all the actions (tasks) it must accomplish to compute the calculation. -This graph, created before the computation, shows for each chunk the route to follow in order to compute the climat index. -This allows some nice optimizations, for example if some spatial or time selections are done within icclim/xclim, it -will only read and load in-memory the necessary data. -However, each task also adds some overhead and, most of the time a small graph will compute faster than a larger. +You can also control if you want to keep the optimized zarr store on +disk by turning ``keep_target_store`` to True. This can be useful if you +wish to compute other indices using the same chunking. -In this graph each chunk has it's own route of all the intermediary transformation it goes though. -The more there are chunks the more routes are created. -In extreme cases, when there are a lot of chunks, the graph may take eons to create and the computation may never start. -This means that configuring a small chunk size leads to a potentially large graph. +***************** + On performances +***************** -The graph is also dependant of the actual calculation. A climate index like "SU" (count of days above 25°C) -will obviously create a much simpler graph than WSDI (longest spell of at least 6 consecutive days where maximum daily -temperature is above the 90th daily percentile). -Finally the resampling may also play a role in the graph complexity. In icclim we control it with ``slice_mode`` parameter. -A yearly slice_mode sampling result in a simpler graph than a monthly sampling. +Computation of ECA&D indices can largely be done in parallel on spatial +dimensions. Indeed, the ECA&D indices available in icclim are all +computed on each individual pixel independently. In a ideal world it +means we could compute each pixel concurrently. In reality this would +result in considerable efforts necessary to chunk data that much, this +would be sub-optimal because the smaller chunk are, the greater dask +overhead is. -Beside, when starting dask on a limited system (e.g a laptop) it's quite easy to exhaust all available memory. -In that case, dask has multiple safety mechanism and can even kill the computing process (a.k.a the worker) once it -reaches a memory limit (default is 95% of memory). -Even before this limit, the performances can deteriorate when dask measures a high memory use of a worker. -When a worker uses around 60% of its memory, dask will ask it to write to disk the intermediary results it has computed. -These i/o operations are much slower than in RAM manipulation, even on recent SSD disk. +.. note:: + By overhead, we mean here the necessary python code running to move + around and handle each independent chunk. + +| Another important aspect of dask to consider for performances is the + generated task graph. +| Dask creates a graph of all the actions (tasks) it must accomplish to + compute the calculation. +| This graph, created before the computation, shows for each chunk the + route to follow in order to compute the climat index. +| This allows some nice optimizations, for example if some spatial or + time selections are done within icclim/xclim, it will only read and + load in-memory the necessary data. +| However, each task also adds some overhead and, most of the time a + small graph will compute faster than a larger. + +| In this graph each chunk has it's own route of all the intermediary + transformation it goes though. The more there are chunks the more + routes are created. +| In extreme cases, when there are a lot of chunks, the graph may take + eons to create and the computation may never start. +| This means that configuring a small chunk size leads to a potentially + large graph. + +| The graph is also dependant of the actual calculation. A climate + index like "SU" (count of days above 25°C) will obviously create a + much simpler graph than WSDI (longest spell of at least 6 consecutive + days where maximum daily +| temperature is above the 90th daily percentile). +| Finally the resampling may also play a role in the graph complexity. + In icclim we control it with ``slice_mode`` parameter. +| A yearly slice_mode sampling result in a simpler graph than a monthly + sampling. + +| Beside, when starting dask on a limited system (e.g a laptop) it's + quite easy to exhaust all available memory. +| In that case, dask has multiple safety mechanism and can even kill + the computing process (a.k.a the worker) once it reaches a memory + limit (default is 95% of memory). +| Even before this limit, the performances can deteriorate when dask + measures a high memory use of a worker. +| When a worker uses around 60% of its memory, dask will ask it to + write to disk the intermediary results it has computed. +| These i/o operations are much slower than in RAM manipulation, even + on recent SSD disk. Hence, there are multiple things to consider to maximize performances. -First, if your data (and the intermediary computation) fits in memory, it might be better to use Numpy backend directly. -To do so, simply provide the opened dataset to icclim: +| First, if your data (and the intermediary computation) fits in + memory, it might be better to use Numpy backend directly. +| To do so, simply provide the opened dataset to icclim: ->>> ds = xarray.open_dataset("data.nc") ->>> icclim.index(in_files=ds, indice_name="SU", out_file="output.nc") +.. code:: python -There will be no parallelization but, on small dataset it's unnecessary. Numpy is natively really fast and dask overhead -may slow it downs. + ds = xarray.open_dataset("data.nc") + icclim.index(in_files=ds, indice_name="SU", out_file="output.nc") + +There will be no parallelization but, on small dataset it's unnecessary. +Numpy is natively really fast and dask overhead may slow it downs. On the other hand when using dask we must: -* Minimize the number of task to speed things up, thus divide data into **large enough chunks**. -* Minimize the workload of each worker to avoid i/o operation, thus divide data into **small enough chunks**. +- Minimize the number of task to speed things up, thus divide data into + **large enough chunks**. +- Minimize the workload of each worker to avoid i/o operation, thus + divide data into **small enough chunks**. In the following we present a few possible configuration for dask. - -Small to medium dataset (a few MB) - No configuration ------------------------------------------------------ -The first approach is to use default values. -By default icclim relies on dask's ``"auto"`` chunking and dask will be started with the threading scheduler. -This scheduler runs everything in the existing python process and will spawn multiple threads -to concurrently compute climate indices. -You can find more information on the default scheduler here: https://docs.dask.org/en/stable/scheduling.html#local-threads - -This can work on most cases for small to medium datasets and may yield the best performances. -However some percentiles based temperature indices (T_90p and T_10p families) may use a lot of memory even on medium datasets. -This memory footprint is caused by the bootstrapping of percentiles, an algorithm used to correct statistical biais. -This bootstrapping use a Monte Carlo simulation, which inherently use a lot of resources. -The longer the bootstrap period is, the more resources are necessary. The bootstrap period is the overlapping years -between the period where percentile are computed (a.k.a "in base") and the period where the climate index is computed -(a.k.a "out of base"). +******************************************************* + Small to medium dataset (a few MB) - No configuration +******************************************************* + +The first approach is to use default values. By default icclim relies on +dask's ``"auto"`` chunking and dask will be started with the threading +scheduler. This scheduler runs everything in the existing python process +and will spawn multiple threads to concurrently compute climate indices. +You can find more information on the default scheduler here: +https://docs.dask.org/en/stable/scheduling.html#local-threads + +| This can work on most cases for small to medium datasets and may + yield the best performances. +| However some percentiles based temperature indices (T_90p and T_10p + families) may use a lot of memory even on medium datasets. +| This memory footprint is caused by the bootstrapping of percentiles, + an algorithm used to correct statistical biais. +| This bootstrapping use a Monte Carlo simulation, which inherently use + a lot of resources. +| The longer the bootstrap period is, the more resources are necessary. + The bootstrap period is the overlapping years between the period + where percentile are computed (a.k.a "in base") and the period where + the climate index is computed (a.k.a "out of base"). .. note:: - To control the "in base" period, ``icclim.index`` provides the ``base_period_time_range`` parameter. - To control the "out of base" period, ``icclim.index`` provides the ``time_range`` parameter. + - To control the "in base" period, ``icclim.index`` provides the + ``base_period_time_range`` parameter. + - To control the "out of base" period, ``icclim.index`` provides the + ``time_range`` parameter. -For these percentile based indices, we recommend to use one of the following configuration. +For these percentile based indices, we recommend to use one of the +following configuration. -Medium to large dataset (~200MB) - dask LocalCluster ----------------------------------------------------- -By default, dask will run on a default threaded scheduler. -This behavior can be overwritten by creating you own "cluster" running locally on your machine. -This LocalCluster is distributed in a separate dask package called "distributed" and is not a mandatory -dependency of icclim. +****************************************************** + Medium to large dataset (~200MB) - dask LocalCluster +****************************************************** + +By default, dask will run on a default threaded scheduler. This behavior +can be overwritten by creating you own "cluster" running locally on your +machine. This LocalCluster is distributed in a separate dask package +called "distributed" and is not a mandatory dependency of icclim. To install it run: -.. code-block:: console +.. code:: console $ conda install dask distributed -c conda-forge -See the documentation for more details: http://distributed.dask.org/en/stable/ +See the documentation for more details: +http://distributed.dask.org/en/stable/ + +Once installed, you can delegate the ``LocalCluster`` instantiation +using `distributed.Client` class. This ``Client`` object creates both a +``LocalCluster`` and a web application to investigate how your +computation is going. This web dashboard is very powerful and helps to +understand where are the computation bottlenecks as well as to visualize +how dask is working. By default it runs on ``localhost:8787``, you can +print the client object to see on which port it runs. + +.. code:: python -Once installed, you can delegate the ``LocalCluster`` instantiation using `distributed.Client` class. -This ``Client`` object creates both a ``LocalCluster`` and a web application to investigate how your computation is going. -This web dashboard is very powerful and helps to understand where are the computation bottlenecks as well as to visualize how dask is working. -By default it runs on ``localhost:8787``, you can print the client object to see on which port it runs. + from distributed import Client ->>> from distributed import Client ->>> client = Client() ->>> print(client) + client = Client() + print(client) -By default dask creates a ``LocalCluster`` with 1 worker (process), CPU count threads and a memory limit up to -the system available memory. +By default dask creates a ``LocalCluster`` with 1 worker (process), CPU +count threads and a memory limit up to the system available memory. .. note:: - You can see how dask counts CPU here: https://github.com/dask/dask/blob/main/dask/system.py - How dask measures available memory here: https://github.com/dask/distributed/blob/main/distributed/worker.py#L4237 - Depending on your OS, these values are not exactly computed the same way. + + - You can see how dask counts CPU here: + https://github.com/dask/dask/blob/main/dask/system.py + - How dask measures available memory here: + https://github.com/dask/distributed/blob/main/distributed/worker.py#L4237 + - Depending on your OS, these values are not exactly computed the + same way. The cluster can be configured directly through Client arguments. ->>> client = Client(memory_limit="16GB", n_workers=1, threads_per_worker=8) +.. code:: python + + client = Client(memory_limit="16GB", n_workers=1, threads_per_worker=8) A few notes: -* The CLient must be started in the same python interpreter as the computation. This is how dask know which scheduler to use. -* If needed, the localCluster can be started independently and the Client connected to a running LocalCluster. See: http://distributed.dask.org/en/stable/client.html -* Each worker is an independent python process and memory_limit is set for each of these processes. So, if you have 16GB of RAM don't set ``memory_limit='16GB'`` unless you run a single worker. -* Memory sharing is much more efficient between threads than between processes (workers), see `dask doc `_ -* On a single worker, a good threads number should be a multiple of your CPU cores (usually \*2). -* All threads of the same worker are idle whenever one of the thread is reading or writing on disk. -* It's useless to spawn too many threads, there are hardware limits on how many of them can run concurrently and if they are too numerous, the OS will waste time orchestrating them. -* A dask worker may write to disk some of its data even if the memory limit is not reached. This seems to be a normal behavior happening when dask knows some intermediary results will not be used soon. However, this can significantly slow down the computation due to i/o. -* Percentiles based indices may need up to **nb_thread * chunk_size * 30** memory which is unusually high for a dask application. We are trying to reduce this memory footprint but it means some costly re-chunking in the middle of computation have to be made. +- The CLient must be started in the same python interpreter as the + computation. This is how dask know which scheduler to use. + +- If needed, the localCluster can be started independently and the + Client connected to a running LocalCluster. See: + http://distributed.dask.org/en/stable/client.html + +- Each worker is an independent python process and memory_limit is set + for each of these processes. So, if you have 16GB of RAM don't set + ``memory_limit='16GB'`` unless you run a single worker. + +- Memory sharing is much more efficient between threads than between + processes (workers), see `dask doc + `_ + +- On a single worker, a good threads number should be a multiple of + your CPU cores (usually \*2). + +- All threads of the same worker are idle whenever one of the thread is + reading or writing on disk. + +- It's useless to spawn too many threads, there are hardware limits on + how many of them can run concurrently and if they are too numerous, + the OS will waste time orchestrating them. + +- A dask worker may write to disk some of its data even if the memory + limit is not reached. This seems to be a normal behavior happening + when dask knows some intermediary results will not be used soon. + However, this can significantly slow down the computation due to i/o. + +- Percentiles based indices may need up to **nb_thread \* chunk_size \* + 30** memory which is unusually high for a dask application. We are + trying to reduce this memory footprint but it means some costly + re-chunking in the middle of computation have to be made. Knowing all these, we can consider a few scenarios. Low memory footprint -~~~~~~~~~~~~~~~~~~~~ -Let's suppose you want to compute indices on your laptop while continue to work on other subjects. -You should configure your local cluster to use not too many threads and processes and to limit the amount of memory -each process (worker) has available. -On my 4 cores, 16GB of RAM laptop I would consider: +==================== + +Let's suppose you want to compute indices on your laptop while continue +to work on other subjects. You should configure your local cluster to +use not too many threads and processes and to limit the amount of memory +each process (worker) has available. On my 4 cores, 16GB of RAM laptop I +would consider: + +.. code:: python ->>> client = Client(memory_limit="10GB", n_workers=1, threads_per_worker=4) + client = Client(memory_limit="10GB", n_workers=1, threads_per_worker=4) -Eventually, to reduce the amount of i/o on disk we can also increase dask memory thresholds: +Eventually, to reduce the amount of i/o on disk we can also increase +dask memory thresholds: ->>> dask.config.set({"distributed.worker.memory.target": "0.8"}) ->>> dask.config.set({"distributed.worker.memory.spill": "0.9"}) ->>> dask.config.set({"distributed.worker.memory.pause": "0.95"}) ->>> dask.config.set({"distributed.worker.memory.terminate": "0.98"}) +.. code:: python -These thresholds are fractions of memory_limit used by dask to take a decision. + dask.config.set({"distributed.worker.memory.target": "0.8"}) + dask.config.set({"distributed.worker.memory.spill": "0.9"}) + dask.config.set({"distributed.worker.memory.pause": "0.95"}) + dask.config.set({"distributed.worker.memory.terminate": "0.98"}) -* At 80% of memory the worker will write to disk its unmanaged memory. -* At 90%, the worker will write all its memory to disk. -* At 95%, the worker pause computation to focus on writing to disk. -* At 98%, the worker is killed to avoid reaching memory limit. +These thresholds are fractions of memory_limit used by dask to take a +decision. -Increasing these thresholds is risky. The memory could be filled quicker than expected resulting in a killed worker -and thus loosing all work done by this worker. -If a single worker is running and it is killed, the whole computation will be restarted (and will likely reach the same +- At 80% of memory the worker will write to disk its unmanaged memory. +- At 90%, the worker will write all its memory to disk. +- At 95%, the worker pause computation to focus on writing to disk. +- At 98%, the worker is killed to avoid reaching memory limit. + +Increasing these thresholds is risky. The memory could be filled quicker +than expected resulting in a killed worker and thus loosing all work +done by this worker. If a single worker is running and it is killed, the +whole computation will be restarted (and will likely reach the same memory limit). High resources use -~~~~~~~~~~~~~~~~~~ -If you want to have the result as quickly as possible it's a good idea to give dask all possible resources. -This may render your computer "laggy" thought. -On my 4 cores (8 CPU threads), 16GB of RAM laptop I would consider: +================== + +If you want to have the result as quickly as possible it's a good idea +to give dask all possible resources. This may render your computer +"laggy" thought. On my 4 cores (8 CPU threads), 16GB of RAM laptop I +would consider: + +.. code:: python ->>> client = Client(memory_limit="16GB", n_workers=1, threads_per_worker=8) + client = Client(memory_limit="16GB", n_workers=1, threads_per_worker=8) -On this kind of configuration, it can be useful to add 1 or 2 workers in case a lot of i/o is necessary. -If there are multiple workers ``memory_limit`` should be reduced accordingly. -It can also be necessary to reduce chunk size. dask default value is around 100 MB per chunk which on some complex -indices may result in a large memory usage. +On this kind of configuration, it can be useful to add 1 or 2 workers in +case a lot of i/o is necessary. If there are multiple workers +``memory_limit`` should be reduced accordingly. It can also be necessary +to reduce chunk size. dask default value is around 100 MB per chunk +which on some complex indices may result in a large memory usage. It's over 9000! -~~~~~~~~~~~~~~~ -This configuration may put your computer to its knees, use it at your own risk. -The idea is to bypass all memory safety implemented by dask. -This may yield very good performances because there will be no i/o on disk by dask itself. -However, when your OS run out of RAM, it will use your disk swap which is sort of similar to dask spilling mechanism but -probably much slower. -And if you run out of swap, your computer will likely crash. -To roll the dices use the following configuration ``memory_limit='0'`` in : +=============== + +This configuration may put your computer to its knees, use it at your +own risk. The idea is to bypass all memory safety implemented by dask. +This may yield very good performances because there will be no i/o on +disk by dask itself. However, when your OS run out of RAM, it will use +your disk swap which is sort of similar to dask spilling mechanism but +probably much slower. And if you run out of swap, your computer will +likely crash. To roll the dices use the following configuration +``memory_limit='0'`` in : + +.. code:: python + + client = Client(memory_limit="0") + +Dask will spawn a worker with multiple threads without any memory +limits. + +*************************************** + Large to huge dataset (1GB and above) +*************************************** + +If you wish to compute climate indices of a large datasets, a personal +computer may not be appropriate. In that case you can deploy a real dask +cluster as opposed to the LocalCluster seen before. You can find more +information on how to deploy dask cluster here: +https://docs.dask.org/en/stable/scheduling.html#dask-distributed-cluster ->>> client = Client(memory_limit="0") +However, if you must run your computation on limited resources, you can +try to: -Dask will spawn a worker with multiple threads without any memory limits. +- Use only one or two threads on a single worker. This will drastically + slow down the computation but very few chunks will be in memory at + once letting you use quite large chunks. -Large to huge dataset (1GB and above) -------------------------------------- -If you wish to compute climate indices of a large datasets, a personal computer may not be appropriate. -In that case you can deploy a real dask cluster as opposed to the LocalCluster seen before. -You can find more information on how to deploy dask cluster here: https://docs.dask.org/en/stable/scheduling.html#dask-distributed-cluster +- Use small chunk size, but beware the smaller they are, the more dask + creates tasks thus, the more complex the dask graph becomes. -However, if you must run your computation on limited resources, you can try to: +- Rechunk your dataset into a zarr storage to optimize file reading and + reduce the amount of rechunking tasks needed by dask. For this, you + should consider the Pangeo rechunker library to ease this process: + https://rechunker.readthedocs.io/en/latest/ A shorthand to Pangeo + rechunker is available in icclim with + `icclim.create_optimized_zarr_store`. -* Use only one or two threads on a single worker. This will drastically slow down the computation but very few chunks will be in memory at once letting you use quite large chunks. -* Use small chunk size, but beware the smaller they are, the more dask creates tasks thus, the more complex the dask graph becomes. -* Rechunk your dataset into a zarr storage to optimize file reading and reduce the amount of rechunking tasks needed by dask. - For this, you should consider the Pangeo rechunker library to ease this process: https://rechunker.readthedocs.io/en/latest/ A shorthand to Pangeo rechunker is available in icclim with `icclim.create_optimized_zarr_store`. -* Split your data into smaller netcdf inputs and run the computation multiple times. +- Split your data into smaller netcdf inputs and run the computation + multiple times. -The last point is the most frustrating option because chunking is supposed to do exactly that. But, sometimes -it can be easier to chunk "by hand" than to find the exact configuration that fit for the input dataset. +The last point is the most frustrating option because chunking is +supposed to do exactly that. But, sometimes it can be easier to chunk +"by hand" than to find the exact configuration that fit for the input +dataset. -Real example ------------- +************** + Real example +************** -On CMIP6 data, when computing the percentile based indices Tx90p for 20 years and, bootstrapping on 19 years we use: +On CMIP6 data, when computing the percentile based indices Tx90p for 20 +years and, bootstrapping on 19 years we use: ->>> client = Client(memory_limit="16GB", n_workers=1, threads_per_worker=2) ->>> dask.config.set({"array.slicing.split_large_chunks": False}) ->>> dask.config.set({"array.chunk-size": "100 MB"}) ->>> dask.config.set({"distributed.worker.memory.target": "0.8"}) ->>> dask.config.set({"distributed.worker.memory.spill": "0.9"}) ->>> dask.config.set({"distributed.worker.memory.pause": "0.95"}) ->>> dask.config.set({"distributed.worker.memory.terminate": "0.98"}) +.. code:: python + client = Client(memory_limit="16GB", n_workers=1, threads_per_worker=2) + dask.config.set({"array.slicing.split_large_chunks": False}) + dask.config.set({"array.chunk-size": "100 MB"}) + dask.config.set({"distributed.worker.memory.target": "0.8"}) + dask.config.set({"distributed.worker.memory.spill": "0.9"}) + dask.config.set({"distributed.worker.memory.pause": "0.95"}) + dask.config.set({"distributed.worker.memory.terminate": "0.98"}) + +**************************************** + Troubleshooting and dashboard analysis +**************************************** -Troubleshooting and dashboard analysis --------------------------------------- This section describe common warnings and errors that dask can raise. There are also some silent issues that only dask dashboard can expose. -To start the dashboard, run the distributed ``Client(...)``. It should be available on ``localhost:8787`` -as a web application (type "localhost:8787" in your browser address bar to access it). +To start the dashboard, run the distributed ``Client(...)``. It should +be available on ``localhost:8787`` as a web application (type +"localhost:8787" in your browser address bar to access it). Memory overload -~~~~~~~~~~~~~~~ -The warning may be ``"distributed.nanny - WARNING - Restarting worker"`` or the error ``"KilledWorker"``. -This means the computation uses more memory than what is available for the worker. -Keep in mind that: - -* ``memory_limit`` parameter is a limit set for each individual worker. -* Some indices, such as percentile based indices (R__p, R__pTOT, T_90p, T_10p families) may use large amount of memory. - This is especially true on temperature based indices where percentiles are bootstrapped. -* You can reduce memory footprint by using smaller chunks. -* Each thread may load multiple chunks in memory at once. - -To solve this issue, you must either increase available memory per worker or reduce the quantity of memory used by the computation. -You can increase memory_limit up to your physical memory available (RAM) with ``Client(memory_limit="16GB")``. -This increase can also speed up computation by reducing writes and reads on disk. -You can reduce the number of concurrently running threads (and workers) in the distributed Client configuration with -``Client(n_workers=1, threads_per_worker=1)``. This may slow down computation. -You can reduce the size of each chunk with ``dask.config.set({"array.chunk-size": "50 MB"})``, default is around 100MB. -This may slow down computation as well. -Or you can combine the three solutions above. -You can read more on this issue here: http://distributed.dask.org/en/stable/killed.html +=============== + +The warning may be ``"distributed.nanny - WARNING - Restarting worker"`` +or the error ``"KilledWorker"``. This means the computation uses more +memory than what is available for the worker. Keep in mind that: + +- ``memory_limit`` parameter is a limit set for each individual worker. + +- Some indices, such as percentile based indices (R__p, R__pTOT, T_90p, + T_10p families) may use large amount of memory. This is especially + true on temperature based indices where percentiles are bootstrapped. + +- You can reduce memory footprint by using smaller chunks. + +- Each thread may load multiple chunks in memory at once. + +| To solve this issue, you must either increase available memory per + worker or reduce the quantity of memory used by the computation. +| You can increase memory_limit up to your physical memory available + (RAM) with ``Client(memory_limit="16GB")``. +| This increase can also speed up computation by reducing writes and + reads on disk. +| You can reduce the number of concurrently running threads (and + workers) in the distributed Client configuration with +| ``Client(n_workers=1, threads_per_worker=1)``. This may slow down + computation. +| You can reduce the size of each chunk with + ``dask.config.set({"array.chunk-size": "50 MB"})``, default is around + 100MB. +| This may slow down computation as well. +| Or you can combine the three solutions above. +| You can read more on this issue here: + http://distributed.dask.org/en/stable/killed.html Garbage collection "wasting" CPU time -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The warning would be: ``distributed.utils_perf - WARNING - full garbage collections took xx% CPU time recently (threshold: 10%)`` -This is usually accompanied by: ``distributed.worker - WARNING - gc.collect() took 1.259s. This is usually a sign that some tasks handle too many Python objects at the same time. Rechunking the work into smaller tasks might help.`` -Python runs on a virtual machine (VM) which handles the memory allocations for us. -This means the VM sometimes needs to cleanup garbage objects that aren't referenced anymore. -This operation takes some CPU resource but free the RAM for other uses. -In our dask context, the warning may be raised when icclim/xclim has created large chunks which takes longer to be -garbage collected. -This warning means some CPU is wasted but the computation is still running. -It might help to re-chunk into smaller chunk. +===================================== + +| The warning would be: ``distributed.utils_perf - WARNING - full + garbage collections took xx% CPU time recently (threshold: 10%)`` +| This is usually accompanied by: ``distributed.worker - WARNING - + gc.collect() took 1.259s. This is usually a sign that some tasks + handle too many Python objects at the same time. Rechunking the work + into smaller tasks might help.`` +| Python runs on a virtual machine (VM) which handles the memory + allocations for us. +| This means the VM sometimes needs to cleanup garbage objects that + aren't referenced anymore. +| This operation takes some CPU resource but free the RAM for other + uses. +| In our dask context, the warning may be raised when icclim/xclim has + created large chunks which takes longer to be garbage collected. +| This warning means some CPU is wasted but the computation is still + running. It might help to re-chunk into smaller chunk. Internal re-chunking -~~~~~~~~~~~~~~~~~~~~ -The warning would be: ``PerformanceWarning: Increasing number of chunks by factor of xx``. -This warning is usually raised when computing percentiles. -In percentiles calculation step, the intermediary data generated to compute percentiles is much larger than the initial data. -First, because of the rolling window used to retrieve all values of each day, the analysed data is multiplied by -window size (usually by 5). -Then, on temperatures indices such as Tx90p, we compute percentiles for each day of year (doy). -This means we must read almost all chunks of time dimension. -To avoid consuming all RAM at once with these, icclim/xclim internally re-chunk data that's why dask warns -that many chunks are being created. -In that case this warning can be ignored. - -Computation never start -~~~~~~~~~~~~~~~~~~~~~~~ -The error raised can be ``CancelledError``. -We can also acknowledge this by looking at dask dashboard and not seeing any task being schedule. -This usually means dask graph is too big and the scheduler has trouble creating it. -If your memory allows it, you can try to increase the chunk-size with ``dask.config.set({"array.chunk-size": "200 MB"})``. -This will reduce the amount of task created on dask graph. -To compensate, you may need to reduce the number of running threads with ``Client(n_workers=1, threads_per_worker=2)``. -This should help limit the memory footprint of the computation. - -.. Note:: - - Beware, if the computation is fast or if the client is not started in the same python process as icclim, - the dashboard may also look empty but the computation is actually running. +==================== + +| The warning would be: ``PerformanceWarning: Increasing number of + chunks by factor of xx``. +| This warning is usually raised when computing percentiles. +| In percentiles calculation step, the intermediary data generated to + compute percentiles is much larger than the initial data. +| First, because of the rolling window used to retrieve all values of + each day, the analysed data is multiplied by window size (usually by + 5). +| Then, on temperatures indices such as Tx90p, we compute percentiles + for each day of year (doy). +| This means we must read almost all chunks of time dimension. +| To avoid consuming all RAM at once with these, icclim/xclim + internally re-chunk data that's why dask warns that many chunks are + being created. +| In that case this warning can be ignored. + +Computation never starts +======================== + +| The error raised can be ``CancelledError``. +| We can also acknowledge this by looking at dask dashboard and not + seeing any task being schedule. +| This usually means dask graph is too big and the scheduler has + trouble creating it. +| If your memory allows it, you can try to increase the chunk-size with + ``dask.config.set({"array.chunk-size": "200 MB"})``. +| This will reduce the amount of task created on dask graph. +| To compensate, you may need to reduce the number of running threads + with ``Client(n_workers=1, threads_per_worker=2)``. +| This should help limit the memory footprint of the computation. + +.. note:: + + Beware, if the computation is fast or if the client is not started in + the same python process as icclim, the dashboard may also look empty + but the computation is actually running. Disk read and write analysis - Dashboard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When poorly configured, the computation can spend most of its CPU time reading and writing chunks on disk. -You can visualize this case by opening dask dashboard, usually on ``localhost:8787``. -In the status page, you can see in the right panel each task dynamically being added. -In these the colourful boxes, each color represent a specific task. -I/O on disk is displayed as orange transparent boxes. You should also see all other threads of the same worker -stopping when one thread is reading or writing on disk. -If there is a lot of i/o you may need to reconfigure dask. -The solution to this are similar to the memory overload described above. -You can increase total available memory with ``Client(memory_limit="16GB")``. -You can decrease memory pressure by reducing chunk size with ``dask.config.set({"array.chunk-size": "50 MB"})`` or -by reducing number of threads with ``Client(n_workers=1, threads_per_worker=2)``. -Beside, you can also benefit from using multiple worker in this case. -Each worker is a separate non blocking process thus they are not locking each other when one of them need to write or -read on disk. They are however slower than threads to share memory, this can result in the "chatterbox" issue presented -below. - -.. Note:: - - - Don't instantiate multiple client with different configurations, put everything in the same Client constructor call. - - Beware, as of icclim 5.0.0, the bootstrapping of percentiles is known to produce **a lot** of i/o. +======================================== + +| When poorly configured, the computation can spend most of its CPU + time reading and writing chunks on disk. +| You can visualize this case by opening dask dashboard, usually on + ``localhost:8787``. +| In the status page, you can see in the right panel each task + dynamically being added. +| In these the colourful boxes, each color represent a specific task. +| I/O on disk is displayed as orange transparent boxes. You should also + see all other threads of the same worker stopping when one thread is + reading or writing on disk. +| If there is a lot of i/o you may need to reconfigure dask. +| The solution to this are similar to the memory overload described + above: + + - You can increase total available memory with + ``Client(memory_limit="16GB")``. + + - You can decrease memory pressure by reducing chunk size with + ``dask.config.set({"array.chunk-size": "50 MB"})`` or by reducing + number of threads with ``Client(n_workers=1, + threads_per_worker=2)``. + +| Beside, you can also benefit from using multiple worker in this case. +| Each worker is a separate non blocking process thus they are not + locking each other when one of them need to write or read on disk. + They are however slower than threads to share memory, this can result + in the "chatterbox" issue presented below. + +.. note:: + + - Don't instantiate multiple client with different configurations, + put everything in the same Client constructor call. + - Beware, as of icclim 5.0.0, the bootstrapping of percentiles is + known to produce **a lot** of i/o. Worker chatterbox syndrome - Dashboard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -In all this document, we mainly recommend to use a single worker with multiple threads. -Most of the code icclim runs is relying on dask and numpy, and both release the python GLI (More details on GIL here: https://realpython.com/python-gil/). -This means we can benefit from multi threading and that's why we usually recommend to use a single process (worker) with -multiple threads. -However, some configuration can benefit from spawning multiple processes (workers). -In dask dashboard, you will see red transparent boxes representing the workers communicating between each other. -If you see a lot of these and if they do not overlap much with other tasks, it means the workers are -spending most of their CPU times exchanging data. -This can be caused by either: - -# Too many workers are spawned for the amount of work. -# The load balancer has a lot of work to do. - -In the first case, the solution is simply to reduce the number of workers and eventually to increase the number of -threads per worker. -For the second case, when a worker has been given too many task to do, the load balancer is charged of -redistributing these task to other worker. It can happen when some task take significant time to be processed. -In icclim/xclim this is for example the case of the ``cal_perc`` function used to compute percentiles. -There is no easy solution for this case, letting the load balancer do its job seems necessary. +====================================== + +| In all this document, we mainly recommend to use a single worker with + multiple threads. +| Most of the code icclim runs is relying on dask and numpy, and both + release the python GLI (More details on GIL here: + https://realpython.com/python-gil/). +| This means we can benefit from multi threading and that's why we + usually recommend to use a single process (worker) with multiple + threads. +| However, some configuration can benefit from spawning multiple + processes (workers). +| In dask dashboard, you will see red transparent boxes representing + the workers communicating between each other. +| If you see a lot of these and if they do not overlap much with other + tasks, it means the workers are spending most of their CPU times + exchanging data. +| This can be caused by either: + + - Too many workers are spawned for the amount of work. + - The load balancer has a lot of work to do. + +| In the first case, the solution is simply to reduce the number of + workers and eventually to increase the number of threads per worker. +| For the second case, when a worker has been given too many task to + do, the load balancer is charged of redistributing these task to + other worker. It can happen when some task take significant time to + be processed. +| In icclim/xclim this is for example the case of the ``cal_perc`` + function used to compute percentiles. +| There is no easy solution for this case, letting the load balancer do + its job seems necessary. Idle threads -~~~~~~~~~~~~ -When looking at dask dashboard, the task timelines should be full of colors. -If you see a lot of emptiness between colored boxes, it means the threads are not doing anything. -It could be caused by a blocking operation in progress (e.g i/o on disk). -Read `Disk read and write analysis - Dashboard`_ above in that case. -It could also be because you have set too many threads and the work cannot be properly divided between each thread. -In that case, you can simply reduce the number of thread in Client configuration with ``Client(n_workers=1, threads_per_worker=4)``. - -Conclusion ----------- - -We can't provide a single configuration which fits all possible datasets and climate indices. -In this document we tried to summarize the few configurations we found useful while developing icclim. -You still need to tailor these configuration to your own needs. - -.. Note:: - This document has been rewritten in january 2022 and the whole stack under icclim is evolving rapidly. - Some information presented here might become outdated quickly. +============ + +| When looking at dask dashboard, the task timelines should be full of + colors. +| If you see a lot of emptiness between colored boxes, it means the + threads are not doing anything. +| It could be caused by a blocking operation in progress (e.g i/o on + disk). Read `Disk read and write analysis - Dashboard`_ above in that + case. +| It could also be because you have set too many threads and the work + cannot be properly divided between each thread. +| In that case, you can simply reduce the number of thread in Client + configuration with ``Client(n_workers=1, threads_per_worker=4)``. + +************ + Conclusion +************ + +| We can't provide a single configuration which fits all possible + datasets and climate indices. +| In this document we tried to summarize the few configurations we + found useful while developing icclim. +| You still need to tailor these configurations to your own needs. + +.. note:: + + - This document has been rewritten in january 2022 and the whole + stack under icclim is evolving rapidly. + - Some information presented here might become outdated quickly. diff --git a/doc/source/how_to/index.rst b/doc/source/how_to/index.rst index 01e9b9c4..b40a9a7e 100644 --- a/doc/source/how_to/index.rst +++ b/doc/source/how_to/index.rst @@ -1,20 +1,23 @@ .. _how_to: -How to -====== +######## + How to +######## -In here you will find recipes on how to use icclim to solve specific problems. +In here you will find recipes on how to use icclim to solve specific +problems. -To grasp the basic usage of icclim, you may consider following :ref:`tutorials` first. -To find more in depth technical knowledge see :ref:`references`. +To grasp the basic usage of icclim, you may consider following +:ref:`tutorials` first. To find more in depth technical knowledge see +:ref:`references`. .. toctree:: - :maxdepth: 2 - :caption: How to + :maxdepth: 2 + :caption: How to - Chunk data and parallelize computation - Run our jupyter notebooks - Compute ECA&D indices - Compute Generic indices - Use icclim through OCGIS - Create customized indices (deprecated) + Chunk data and parallelize computation + Run our jupyter notebooks + Compute ECA&D indices + Compute Generic indices + Use icclim through OCGIS + Create customized indices (deprecated) diff --git a/doc/source/how_to/notebooks.rst b/doc/source/how_to/notebooks.rst index 6124ffc2..6326cbb4 100644 --- a/doc/source/how_to/notebooks.rst +++ b/doc/source/how_to/notebooks.rst @@ -1,13 +1,25 @@ +############################ + Jupyter Notebooks examples +############################ -Jupyter Notebooks examples -========================== +Within the `IS-ENES `_ project, jupyter notebooks +were written to show real life scenarios of icclim usage. The are stored +within their `own repository +`_. -Within the `IS-ENES `_ project, jupyter notebooks were written to show real life scenarios of -icclim usage. -The are stored within their `own repository `_. +- `Averaged Temperature Anomaly 2081-2100 vs 1971-2000 SSP585 + `_ -- `Averaged Temperature Anomaly 2081-2100 vs 1971-2000 SSP585 `_ -- `Number of Summer Days `_ -- `Percentage of days when Tmax > 90th percentil (TX90p) `_ -- `Averaged surface temperature anomaly and the same for precipitation anomaly over the period 2081-2100 compared to the period 1981-2000, and display delta-T delta-P diagram `_ -- `Number of days with maximum temperature above freezing `_ +- `Number of Summer Days + `_ + +- `Percentage of days when Tmax > 90th percentil (TX90p) + `_ + +- `Averaged surface temperature anomaly and the same for precipitation + anomaly over the period 2081-2100 compared to the period 1981-2000, + and display delta-T delta-P diagram + `_ + +- `Number of days with maximum temperature above freezing + `_ diff --git a/doc/source/how_to/ocgis.rst b/doc/source/how_to/ocgis.rst index 908c4a9c..0de73e2f 100644 --- a/doc/source/how_to/ocgis.rst +++ b/doc/source/how_to/ocgis.rst @@ -1,218 +1,273 @@ .. _icclim_ocgis: -icclim called from OpenClimateGIS - Examples -============================================== -icclim indices (`ECA&D climate indices `_) are implemented in the -`OpenClimateGIS `_ (Version 1.1.0) Python package. +############################################## + icclim called from OpenClimateGIS - Examples +############################################## +icclim indices (`ECA&D climate indices +`_) +are implemented in the `OpenClimateGIS `_ +(Version 1.1.0) Python package. ->>> import ocgis ->>> rd = ocgis.RequestDataset("tas_19800101_19891231.nc", variable="tas") +.. code:: python + + import ocgis + + rd = ocgis.RequestDataset("tas_19800101_19891231.nc", variable="tas") It is also possible to pass a list of datasets: ->>> rd = ocgis.RequestDataset( -... ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], variable="tas" -... ) +.. code:: python + + rd = ocgis.RequestDataset( + ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], variable="tas" + ) -Subsetting with ``time_range`` and/or ``time_region`` ------------------------------------------------------ +******************************************************* + Subsetting with ``time_range`` and/or ``time_region`` +******************************************************* .. note:: - See `ocgis time_range `_ doc - and `ocgis time_region `_ doc. + + See `ocgis time_range + `_ + doc and `ocgis time_region + `_ + doc. For temporal subsetting we use the ``time_range`` parameter: ->>> import datetime ->>> dt1 = datetime.datetime(1985, 1, 1) ->>> dt2 = datetime.datetime(1995, 12, 31) ->>> rd = ocgis.RequestDataset( -... ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], -... variable="tas", -... time_range=[dt1, dt2], -... ) +.. code:: python + + import datetime + + dt1 = datetime.datetime(1985, 1, 1) + dt2 = datetime.datetime(1995, 12, 31) + rd = ocgis.RequestDataset( + ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], + variable="tas", + time_range=[dt1, dt2], + ) or/and the ``time_region`` parameter: ->>> rd = ocgis.RequestDataset( -... ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], -... variable="tas", -... time_region={"month": [6, 7, 8]}, -... ) +.. code:: python ->>> rd = ocgis.RequestDataset( -... ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], -... variable="tas", -... time_region={"year": [1989, 1990, 1991], "month": [6, 7, 8]}, -... ) + rd = ocgis.RequestDataset( + ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], + variable="tas", + time_region={"month": [6, 7, 8]}, + ) -Temporal aggregation with ``calc_grouping`` -------------------------------------------- +.. code:: python + + rd = ocgis.RequestDataset( + ["tas_19800101_19891231.nc", "tas_19900101_19991231.nc"], + variable="tas", + time_region={"year": [1989, 1990, 1991], "month": [6, 7, 8]}, + ) + +********************************************* + Temporal aggregation with ``calc_grouping`` +********************************************* .. note:: - See `ocgis calc_grouping `_ doc. + + See `ocgis calc_grouping + `_ + doc. Annual values: ->>> calc_grouping = ["year"] +.. code:: python + + calc_grouping = ["year"] Monthly values: ->>> calc_grouping = ["year", "month"] # or calc_grouping = ['month', 'year'] +.. code:: python -Seasonal values: + calc_grouping = ["year", "month"] # or calc_grouping = ['month', 'year'] ->>> calc_grouping = [[3, 4, 5], "unique"] # spring season (MAM) +Seasonal values: ->>> calc_grouping = [[6, 7, 8], "unique"] # summer season (JJA) +.. code:: python ->>> calc_grouping = [[9, 10, 11], "unique"] # autumn season (SON) + spring = [[3, 4, 5], "unique"] # spring season (MAM) + summer = [[6, 7, 8], "unique"] # summer season (JJA) + autumn = [[9, 10, 11], "unique"] # autumn season (SON) + winter = [[12, 1, 2], "unique"] # winter season (DJF) + long_winter = [[10, 11, 12, 1, 2, 3], "unique"] # winter half-year (ONDJFM) + long_summer = [[4, 5, 6, 7, 8, 9], "unique"] # summer half-year (AMJJAS) ->>> calc_grouping = [[12, 1, 2], "unique"] # winter season (DJF) +************************************** + Example 1: simple indice calculation +************************************** ->>> calc_grouping = [[10, 11, 12, 1, 2, 3], "unique"] # winter half-year (ONDJFM) +The example below will create a netCDF file "indiceTG_1985_1995.nc" +containing TG indice: ->>> calc_grouping = [[4, 5, 6, 7, 8, 9], "unique"] # summer half-year (AMJJAS) +.. code:: python + calc_icclim = [{"func": "icclim_TG", "name": "TG"}] + ops = ocgis.OcgOperations( + dataset=rd, + calc=calc_icclim, + calc_grouping=calc_grouping, + prefix="indiceTG_1985_1995", + output_format="nc", + add_auxiliary_files=False, + ) + ops.execute() -Example 1: simple indice calculation --------------------------------------- +********************************************* + Example 2: multivariable indice calculation +********************************************* -The example below will create a netCDF file "indiceTG_1985_1995.nc" containing TG indice: +To calculate an indice based on 2 variables: ->>> calc_icclim = [{"func": "icclim_TG", "name": "TG"}] ->>> ops = ocgis.OcgOperations( -... dataset=rd, -... calc=calc_icclim, -... calc_grouping=calc_grouping, -... prefix="indiceTG_1985_1995", -... output_format="nc", -... add_auxiliary_files=False, -... ) ->>> ops.execute() +.. code:: python + + rd_tasmin = ocgis.RequestDataset(tasmin_19800101_19891231.nc, "tasmin") + rd_tasmax = ocgis.RequestDataset(tasmax_19800101_19891231.nc, "tasmax") + rds = [rd_tasmin, rd_tasmax] + calc_grouping = ["year", "month"] + calc_icclim = [ + { + "func": "icclim_ETR", + "name": "ETR", + "kwds": {"tasmin": "tasmin", "tasmax": "tasmax"}, + } + ] + ops = ocgis.OcgOperations( + dataset=rds, + calc=calc_icclim, + calc_grouping=calc_grouping, + prefix="indiceETR_1980_1989", + output_format="nc", + add_auxiliary_files=False, + ) + ops.execute() +.. _percentil_label: -Example 2: multivariable indice calculation ---------------------------------------------- -To calculate an indice based on 2 variables: +************************************* + Example 3: percentile-based indices +************************************* ->>> rd_tasmin = ocgis.RequestDataset(tasmin_19800101_19891231.nc, "tasmin") ->>> rd_tasmax = ocgis.RequestDataset(tasmax_19800101_19891231.nc, "tasmax") ->>> rds = [rd_tasmin, rd_tasmax] ->>> calc_grouping = ["year", "month"] ->>> calc_icclim = [ -... { -... "func": "icclim_ETR", -... "name": "ETR", -... "kwds": {"tasmin": "tasmin", "tasmax": "tasmax"}, -... } -... ] ->>> ops = ocgis.OcgOperations( -... dataset=rds, -... calc=calc_icclim, -... calc_grouping=calc_grouping, -... prefix="indiceETR_1980_1989", -... output_format="nc", -... add_auxiliary_files=False, -... ) ->>> ops.execute() +Calculation of percentile-based indices is more complicated. The example +below shows how to calculate the TG10p indice. -.. _percentil_label: +.. code:: python -Example 3: percentile-based indices ------------------------------------ -Calculation of percentile-based indices is more complicated. -The example below shows how to calculate the TG10p indice. + dt1 = datetime.datetime(1980, 1, 1) + dt2 = datetime.datetime(1989, 12, 31) + time_range_indice = [dt1, dt2] # we will calculate the indice for 10 years + rd = ocgis.RequestDataset(tas_files, "tas", time_range=time_range_indice) + basis_indice = rd.get() # OCGIS data object ->>> dt1 = datetime.datetime(1980, 1, 1) ->>> dt2 = datetime.datetime(1989, 12, 31) ->>> time_range_indice = [dt1, dt2] # we will calculate the indice for 10 years ->>> rd = ocgis.RequestDataset(tas_files, "tas", time_range=time_range_indice) ->>> basis_indice = rd.get() # OCGIS data object +We do the same for reference period (usually the reference period is the +1961-1990 (30 years)): -We do the same for reference period (usually the -reference period is the 1961-1990 (30 years)): +.. code:: python ->>> dt1_ref = datetime.datetime(1961, 1, 1) ->>> dt2_ref = datetime.datetime(1990, 12, 31) ->>> time_range_ref = [dt1_ref, dt2_ref] ->>> rd_ref = ocgis.RequestDataset(tas_files, "tas", time_range=time_range_ref) ->>> basis_ref = rd_ref.get() # OCGIS data object + dt1_ref = datetime.datetime(1961, 1, 1) + dt2_ref = datetime.datetime(1990, 12, 31) + time_range_ref = [dt1_ref, dt2_ref] + rd_ref = ocgis.RequestDataset(tas_files, "tas", time_range=time_range_ref) + basis_ref = rd_ref.get() # OCGIS data object To get the 10th daily percentile basis of the reference period: ->>> values_ref = basis_ref.variables["tas"].value ->>> temporal = basis_ref.temporal.value_datetime ->>> percentile = 10 ->>> width = 5 # 5-day window ->>> from ocgis.calc.library.index.dynamic_kernel_percentile import ( -... DynamicDailyKernelPercentileThreshold, -... ) ->>> daily_percentile = DynamicDailyKernelPercentileThreshold.get_daily_percentile( -... values_ref, temporal, percentile, width -... ) # daily_percentile.shape = 366 +.. code:: python + + values_ref = basis_ref.variables["tas"].value + temporal = basis_ref.temporal.value_datetime + percentile = 10 + width = 5 # 5-day window + from ocgis.calc.library.index.dynamic_kernel_percentile import ( + DynamicDailyKernelPercentileThreshold, + ) + + daily_percentile = DynamicDailyKernelPercentileThreshold.get_daily_percentile( + values_ref, temporal, percentile, width + ) # daily_percentile.shape = 366 Finally, to calculate the TG10p indice: ->>> calc_grouping = ["year", "month"] # or other ->>> kwds = { -... "percentile": percentile, -... "width": width, -... "operation": "lt", -... "daily_percentile": daily_percentile, -... } # operation: lt = "less then", beacause we count the number of days < 10th percentile ->>> calc = [ -... {"func": "dynamic_kernel_percentile_threshold", "name": "TG10p", "kwds": kwds} -... ] ->>> ops = ocgis.OcgOperations( -... dataset=rd, -... calc_grouping=calc_grouping, -... calc=calc, -... output_format="nc", -... prefix="indiceTG10p_1980_1989", -... add_auxiliary_files=False, -... ) ->>> ops.execute() - - -Example 4: OPeNDAP dataset, big request ---------------------------------------- -If you want to process OPeNDAP datasets of total size more than for example the OPenDAP/THREDDS limit (500 Mbytes), -use the `compute function `_ which processes data chunk-by-chunk: - ->>> from ocgis.util.large_array import compute - -This function takes the *tile_dimention* parameter, -so first we need to find an optimal tile dimention (number of pixels) to get a chunk less than the the OPenDAP/THREDDS limit: - ->>> limit_opendap_mb = ( -... 475.0 # we reduce the limit on about 25 Mbytes (don't ask me why :) ) -... ) ->>> size = ops.get_base_request_size() ->>> nb_time_coordinates_rd = size["variables"]["tas"]["temporal"]["shape"][0] ->>> element_in_kb = size["total"] / reduce( -... lambda x, y: x * y, size["variables"]["tas"]["value"]["shape"] -... ) ->>> element_in_mb = element_in_kb * 0.001 ->>> import numpy as np ->>> tile_dim = np.sqrt( -... limit_opendap_mb / (element_in_mb * nb_time_coordinates_rd) -... ) # maximum chunk size - -.. note:: Chunks are cut along the time axis, i.e. a maximum chunk size in pixels is **tile_dimention** x **tile_dimention** x **number_time_steps**. +.. code:: python + + calc_grouping = ["year", "month"] # or other + kwds = { + "percentile": percentile, + "width": width, + "operation": "lt", + "daily_percentile": daily_percentile, + } # operation: lt = "less then", beacause we count the number of days < 10th percentile + calc = [{"func": "dynamic_kernel_percentile_threshold", "name": "TG10p", "kwds": kwds}] + ops = ocgis.OcgOperations( + dataset=rd, + calc_grouping=calc_grouping, + calc=calc, + output_format="nc", + prefix="indiceTG10p_1980_1989", + add_auxiliary_files=False, + ) + ops.execute() + +***************************************** + Example 4: OPeNDAP dataset, big request +***************************************** + +If you want to process OPeNDAP datasets of total size more than for +example the OPenDAP/THREDDS limit (500 Mbytes), use the `compute +function +`_ +which processes data chunk-by-chunk: + +.. code:: python + + from ocgis.util.large_array import compute + +This function takes the *tile_dimention* parameter, so first we need to +find an optimal tile dimention (number of pixels) to get a chunk less +than the the OPenDAP/THREDDS limit: + +.. code:: python + + limit_opendap_mb = ( + 475.0 # we reduce the limit on about 25 Mbytes (don't ask me why :) ) + ) + size = ops.get_base_request_size() + nb_time_coordinates_rd = size["variables"]["tas"]["temporal"]["shape"][0] + element_in_kb = size["total"] / reduce( + lambda x, y: x * y, size["variables"]["tas"]["value"]["shape"] + ) + element_in_mb = element_in_kb * 0.001 + import numpy as np + + tile_dim = np.sqrt( + limit_opendap_mb / (element_in_mb * nb_time_coordinates_rd) + ) # maximum chunk size -.. figure:: /_static/chunks.png +.. note:: + Chunks are cut along the time axis, i.e. a maximum chunk size in + pixels is **tile_dimention** x **tile_dimention** x + **number_time_steps**. + +.. figure:: /_static/chunks.png Now we can use the compute function: ->>> rd = ocgis.RequestDataset(input_files, variable="tas", time_range=[dt1, dt2]) ->>> ops = ocgis.OcgOperations( -... dataset=rd, -... calc=calc_icclim, -... calc_grouping=calc_grouping, -... prefix="indiceETR_1980_1989", -... add_auxiliary_files=False, -... ) ->>> compute(ops, tile_dimension=tile_dim) +.. code:: python + + rd = ocgis.RequestDataset(input_files, variable="tas", time_range=[dt1, dt2]) + ops = ocgis.OcgOperations( + dataset=rd, + calc=calc_icclim, + calc_grouping=calc_grouping, + prefix="indiceETR_1980_1989", + add_auxiliary_files=False, + ) + compute(ops, tile_dimension=tile_dim) diff --git a/doc/source/how_to/recipes_custom.rst b/doc/source/how_to/recipes_custom.rst index dc67e332..7fb7221d 100644 --- a/doc/source/how_to/recipes_custom.rst +++ b/doc/source/how_to/recipes_custom.rst @@ -1,450 +1,527 @@ .. _custom_indices_recipes: -Custom indices recipes ----------------------- +######################## + Custom indices recipes +######################## .. note:: - Custom indices are deprecated. You should switch to :ref:`generic_indices_recipes` API. ->>> import icclim ->>> import datetime + Custom indices are deprecated. You should switch to + :ref:`generic_indices_recipes` API. -Max of tas within the year -~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. code:: python -.. code-block:: python + import icclim + import datetime - from icclim.util import callback +**************************** + Max of tas within the year +**************************** - my_index_params = {"index_name": "my_index", "calc_operation": "max"} +.. code:: python - file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" + from icclim.util import callback - icclim.index( - user_index=my_index_params, - in_files=file_tas, - var_name="tas", - slice_mode="year", - out_file=out_f, - callback=callback.defaultCallback2, - ) + my_index_params = {"index_name": "my_index", "calc_operation": "max"} + file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" -Min of positive values within the year and the date of this minimum -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + icclim.index( + user_index=my_index_params, + in_files=file_tas, + var_name="tas", + slice_mode="year", + out_file=out_f, + callback=callback.defaultCallback2, + ) + +********************************************************************* + Min of positive values within the year and the date of this minimum +********************************************************************* Get minimum temperature which is above zero Celsius and find its date. -.. warning:: If input data are in Kelvin, then ``thresh`` must be in Kelvin too. +.. warning:: -.. note:: An additional variable will be created in output netCDF file: "date_event" with the date of the *first* occurrence of min positive. + If input data are in Kelvin, then ``thresh`` must be in Kelvin too. +.. note:: -.. code-block:: python + An additional variable will be created in output netCDF file: + "date_event" with the date of the *first* occurrence of min positive. - my_index_params = { - "index_name": "my_index", - "calc_operation": "min", - "logical_operation": "gt", - "thresh": 0 + 273.15, ### input data in Kelvin ==> threshold in Kelvin! - "date_event": True, - } +.. code:: python - file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" + my_index_params = { + "index_name": "my_index", + "calc_operation": "min", + "logical_operation": "gt", + "thresh": 0 + 273.15, ### input data in Kelvin ==> threshold in Kelvin! + "date_event": True, + } - icclim.index( - user_index=my_index_params, - in_files=file_tasmin, - var_name="tasmin", - slice_mode="year", - out_file=out_f, - callback=callback.defaultCallback2, - ) + file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + icclim.index( + user_index=my_index_params, + in_files=file_tasmin, + var_name="tasmin", + slice_mode="year", + out_file=out_f, + callback=callback.defaultCallback2, + ) -Mean of a selected period -~~~~~~~~~~~~~~~~~~~~~~~~~ +*************************** + Mean of a selected period +*************************** -.. note:: ``slice_mode`` must be ``None`` to apply the operation to the whole period of selected time range. +.. note:: -.. code-block:: python + ``slice_mode`` must be ``None`` to apply the operation to the whole + period of selected time range. - my_index_params = {"index_name": "my_index", "calc_operation": "mean"} +.. code:: python - file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" + my_index_params = {"index_name": "my_index", "calc_operation": "mean"} - tr = [datetime.datetime(1901, 1, 1), datetime.datetime(1920, 12, 31)] + file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" - icclim.index( - user_index=my_index_params, - in_files=file_tas, - var_name="tas", - slice_mode=None, - time_range=tr, - out_file=out_f, - callback=callback.defaultCallback2, - ) + tr = [datetime.datetime(1901, 1, 1), datetime.datetime(1920, 12, 31)] + icclim.index( + user_index=my_index_params, + in_files=file_tas, + var_name="tas", + slice_mode=None, + time_range=tr, + out_file=out_f, + callback=callback.defaultCallback2, + ) -Number of days when tas < 15 degrees Celsius of each Autumn -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +************************************************************* + Number of days when tas < 15 degrees Celsius of each Autumn +************************************************************* -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. +.. note:: -.. code-block:: python + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": "lt", - "thresh": 15 + 273.15, ### input data in Kelvin ==> threshold in Kelvin! - } +.. code:: python - file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": "lt", + "thresh": 15 + 273.15, ### input data in Kelvin ==> threshold in Kelvin! + } - icclim.index( - user_index=my_index_params, - in_files=file_tas, - var_name="tas", - slice_mode="SON", - out_unit="days", - out_file=out_f, - callback=callback.defaultCallback2, - ) + file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + icclim.index( + user_index=my_index_params, + in_files=file_tas, + var_name="tas", + slice_mode="SON", + out_unit="days", + out_file=out_f, + callback=callback.defaultCallback2, + ) -Percentage of days when tasmax > 80th pctl and at which date it happens -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +************************************************************************* + Percentage of days when tasmax > 80th pctl and at which date it happens +************************************************************************* -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. +.. note:: -.. note:: 80th pctl: 80th percentile of tasmax in base period + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. -.. note:: Two additional variables will be created in output netCDF file: "date_event_start" (the date of the first occurence of tasmax > 80th pctl) and "date_event_end" (the date of the last occurence of tasmax > 80th pctl). +.. note:: -.. code-block:: python + 80th pctl: 80th percentile of tasmax in base period - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": "gt", - "thresh": "p80", - "var_type": "t", - "date_event": True, - } +.. note:: - file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] + Two additional variables will be created in output netCDF file: + "date_event_start" (the date of the first occurence of tasmax > 80th + pctl) and "date_event_end" (the date of the last occurence of tasmax + > 80th pctl). + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": "gt", + "thresh": "p80", + "var_type": "t", + "date_event": True, + } + + file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] + + icclim.index( + user_index=my_index_params, + in_files=file_tasmax, + var_name="tasmax", + slice_mode="year", + base_period_time_range=bp, + out_unit="%", + out_file=out_f, + callback=callback.defaultCallback2, + ) + +************************************************************ + Number of days when daily precipitation amount > 85th pctl +************************************************************ - icclim.index( - user_index=my_index_params, - in_files=file_tasmax, - var_name="tasmax", - slice_mode="year", - base_period_time_range=bp, - out_unit="%", - out_file=out_f, - callback=callback.defaultCallback2, - ) +.. note:: + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. -Number of days when daily precipitation amount > 85th pctl -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. note:: -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. + daily precipitation amount: daily precipitation amount on a wet day + (RR >= 1.0 mm) -.. note:: daily precipitation amount: daily precipitation amount on a wet day (RR >= 1.0 mm) +.. note:: -.. note:: 85th pctl: percentile of precipitation on wet days in base period + 85th pctl: percentile of precipitation on wet days in base period -.. code-block:: python +.. code:: python - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": "gt", - "thresh": "p85", - "var_type": "p", - } + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": "gt", + "thresh": "p85", + "var_type": "p", + } - file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" + file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" - icclim.index( - user_index=my_index_params, - in_files=file_pr, - var_name="pr", - slice_mode="year", - base_period_time_range=bp, - out_unit="days", - out_file=out_f, - callback=callback.defaultCallback2, - ) + icclim.index( + user_index=my_index_params, + in_files=file_pr, + var_name="pr", + slice_mode="year", + base_period_time_range=bp, + out_unit="days", + out_file=out_f, + callback=callback.defaultCallback2, + ) +*************************************************************************************** + Max number of consecutive days when tasmax >= 25 degrees Celsius + date of the events +*************************************************************************************** -Max number of consecutive days when tasmax >= 25 degrees Celsius + date of the events -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. note:: + + Two additional variables will be created in output netCDF file: + "date_event_start" (the first date of the found sequence) and + "date_event_end" (the last date of the found sequence). + +.. warning:: + + If there are several sequences of the same length, the + "date_event_start" and "date_event_end" will correspond to the + *first* sequence. + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "max_nb_consecutive_events", + "logical_operation": "get", + "thresh": 25 + 273.15, ### input data in Kelvin ==> threshold in Kelvin! + "date_event": True, + } + + file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + + icclim.index( + user_index=my_index_params, + in_files=file_tasmax, + var_name="tasmax", + slice_mode="year", + out_file=out_f, + callback=callback.defaultCallback2, + ) + +**************************************************** + Max of sum of precipitation in 10 consecutive days +**************************************************** + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "run_sum", + "extreme_mode": "max", + "window_width": 10, + } + + file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + + icclim.index( + user_index=my_index_params, + in_files=file_pr, + var_name="pr", + slice_mode=["season", [4, 5, 6, 7, 8]], + out_file=out_f, + callback=callback.defaultCallback2, + ) + +****************************************************************** + Min of mean of tasmin in 7 consecutive days + date of the events +****************************************************************** -.. note:: Two additional variables will be created in output netCDF file: "date_event_start" (the first date of the found sequence) and "date_event_end" (the last date of the found sequence). +.. note:: -.. warning:: If there are several sequences of the same length, the "date_event_start" and "date_event_end" will correspond to the *first* sequence. + Two additional variables will be created in output netCDF file: + "date_event_start" (the date corrsponding to the beggining of the + "window" satisfying the condition) and "date_event_end" (the date + corrsponding to the end of the "window" satisfying the condition). -.. code-block:: python - - my_index_params = { - "index_name": "my_index", - "calc_operation": "max_nb_consecutive_events", - "logical_operation": "get", - "thresh": 25 + 273.15, ### input data in Kelvin ==> threshold in Kelvin! - "date_event": True, - } +.. warning:: - file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" + If several "windows" with the same result are found, the + "date_event_start" and "date_event_end" will correspond to the + *first* one. - icclim.index( - user_index=my_index_params, - in_files=file_tasmax, - var_name="tasmax", - slice_mode="year", - out_file=out_f, - callback=callback.defaultCallback2, - ) +.. code:: python -Max of sum of precipitation in 10 consecutive days -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + my_index_params = { + "index_name": "my_index", + "calc_operation": "run_mean", + "extreme_mode": "min", + "window_width": 7, + "date_event": True, + } -.. code-block:: python + file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" - my_index_params = { - "index_name": "my_index", - "calc_operation": "run_sum", - "extreme_mode": "max", - "window_width": 10, - } - - file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - - icclim.index( - user_index=my_index_params, - in_files=file_pr, - var_name="pr", - slice_mode=["season", [4, 5, 6, 7, 8]], - out_file=out_f, - callback=callback.defaultCallback2, - ) + icclim.index( + user_index=my_index_params, + in_files=file_tasmin, + var_name="tasmin", + slice_mode=["season", ([11, 12], [1, 2])], + out_file=out_f, + callback=callback.defaultCallback2, + ) +************************************************ + Anomaly of tasmax between 2 period of 30 years +************************************************ + +.. note:: -Min of mean of tasmin in 7 consecutive days + date of the events -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Result could be returned as percentage value relative to mean value + of reference period, if ``out_unit='%'``. -.. note:: Two additional variables will be created in output netCDF file: "date_event_start" (the date corrsponding to the beggining of the "window" satisfying the condition) and "date_event_end" (the date corrsponding to the end of the "window" satisfying the condition). +.. code:: python -.. warning:: If several "windows" with the same result are found, the "date_event_start" and "date_event_end" will correspond to the *first* one. + my_index_params = {"index_name": "my_index", "calc_operation": "anomaly"} + file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + # studied period: future period + tr = [datetime.datetime(1971, 1, 1), datetime.datetime(2000, 12, 31)] + # reference period: past period + tr_base = [datetime.datetime(1901, 1, 1), datetime.datetime(1930, 12, 31)] + + icclim.index( + user_index=my_index_params, + in_files=file_tasmax, + var_name="tasmax", + time_range=tr, + base_period_time_range=tr_base, + out_file=out_f, + callback=callback.defaultCallback2, + ) + +********************************************************************************** + Number of days when tasmin >= 10 degrees Celsius and tasmax > 25 degrees Celsius +********************************************************************************** + +.. note:: -.. code-block:: python + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": ["get", "gt"], + "thresh": [ + 10 + 273.15, + 25 + 273.15, + ], ### input data in Kelvin ==> threshold in Kelvin! + "link_logical_operations": "and", + } + + file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + + icclim.index( + user_index=my_index_params, + in_files=[file_tasmin, file_tasmax], + var_name=["tasmin", "tasmax"], + slice_mode="JJA", + out_unit="days", + out_file=out_f, + callback=callback.defaultCallback2, + ) + +************************************************************************************************** + Percentage of days when tasmin >= 10 degrees Celsius and tasmax > 90th pctl + date of the events +************************************************************************************************** - my_index_params = { - "index_name": "my_index", - "calc_operation": "run_mean", - "extreme_mode": "min", - "window_width": 7, - "date_event": True, - } - - file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - - icclim.index( - user_index=my_index_params, - in_files=file_tasmin, - var_name="tasmin", - slice_mode=["season", ([11, 12], [1, 2])], - out_file=out_f, - callback=callback.defaultCallback2, - ) - -Anomaly of tasmax between 2 period of 30 years -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. note:: Result could be returned as percentage value relative to mean value of reference period, if ``out_unit='%'``. - -.. code-block:: python - - my_index_params = {"index_name": "my_index", "calc_operation": "anomaly"} - - file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - # studied period: future period - tr = [datetime.datetime(1971, 1, 1), datetime.datetime(2000, 12, 31)] - # reference period: past period - tr_base = [datetime.datetime(1901, 1, 1), datetime.datetime(1930, 12, 31)] +.. note:: + + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. + +.. note:: + + It is possible to use numeric and percentile threshold at the time. + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": ["get", "gt"], + "thresh": [ + 10 + 273.15, + "p90", + ], ### input data in Kelvin ==> threshold in Kelvin! + "var_type": "t", ### or ['-','t'] + "link_logical_operations": "and", + "date_event": True, + } + + file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + + bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] + icclim.index( + user_index=my_index_params, + in_files=[file_tasmin, file_tasmax], + var_name=["tasmin", "tasmax"], + slice_mode="JJA", + base_period_time_range=bp, + out_unit="%", + out_file=out_f, + callback=callback.defaultCallback2, + ) + +.. _examples_cd_cw_wd_ww_label: + +************************************************************* + Number of days when tas < 25th pctl and precip. > 75th pctl +************************************************************* + +.. note:: + + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. + +4 compound indices defined in +https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf (see +the section 5.3.3 "Compound indices") are based on daily precipitation +(RR) and mean temperature (TG) variables: + + - CD (cold/dry days): (TG < 25th pctl) and (RR < 25th pctl) + - CW (cold/wet days): (TG < 25th pctl) and (RR > 75th pctl) + - WD (warm/dry days): (TG > 75th pctl) and (RR < 25th pctl) + - WW (warm/wet days): (TG > 75th pctl) and (RR > 75th pctl) + +.. note:: + + RR is a daily precipitation on a *wet* day, and its percentile value + is computed from set of wet days also. + +.. note:: + + Percentiles thresholds computing uses differents methods as it was + described :ref:`here `. + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": ["lt", "gt"], + "thresh": ["p25", "p75"], + "var_type": ["t", "p"], + "link_logical_operations": "and", + } + + file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + + bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] + icclim.index( + user_index=my_index_params, + in_files=[file_tas, file_pr], + var_name=["tas", "pr"], + slice_mode="year", + out_unit="days", + base_period_time_range=bp, + out_file=out_f, + callback=callback.defaultCallback2, + ) + +*************************************************************************************** + Number of days when tasmax > 90th pctl and tasmin >= 10 and precipitation < 30th pctl +*************************************************************************************** + +.. note:: - icclim.index( - user_index=my_index_params, - in_files=file_tasmax, - var_name="tasmax", - time_range=tr, - base_period_time_range=tr_base, - out_file=out_f, - callback=callback.defaultCallback2, - ) - - -Number of days when tasmin >= 10 degrees Celsius and tasmax > 25 degrees Celsius -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. - -.. code-block:: python - - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": ["get", "gt"], - "thresh": [ - 10 + 273.15, - 25 + 273.15, - ], ### input data in Kelvin ==> threshold in Kelvin! - "link_logical_operations": "and", - } - - file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - - icclim.index( - user_index=my_index_params, - in_files=[file_tasmin, file_tasmax], - var_name=["tasmin", "tasmax"], - slice_mode="JJA", - out_unit="days", - out_file=out_f, - callback=callback.defaultCallback2, - ) - - -Percentage of days when tasmin >= 10 degrees Celsius and tasmax > 90th pctl + date of the events -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. - -.. note:: It is possible to use numeric and percentile threshold at the time. - -.. code-block:: python - - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": ["get", "gt"], - "thresh": [ - 10 + 273.15, - "p90", - ], ### input data in Kelvin ==> threshold in Kelvin! - "var_type": "t", ### or ['-','t'] - "link_logical_operations": "and", - "date_event": True, - } - - file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - - bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] - icclim.index( - user_index=my_index_params, - in_files=[file_tasmin, file_tasmax], - var_name=["tasmin", "tasmax"], - slice_mode="JJA", - base_period_time_range=bp, - out_unit="%", - out_file=out_f, - callback=callback.defaultCallback2, - ) - - -.. _examples_CD_CW_WD_WW_label: - -Number of days when tas < 25th pctl and precip. > 75th pctl -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. - -4 compound indices defined in https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf (see the section 5.3.3 "Compound indices") are -based on daily precipitation (RR) and mean temperature (TG) variables: - - - CD (cold/dry days): (TG < 25th pctl) and (RR < 25th pctl) - - CW (cold/wet days): (TG < 25th pctl) and (RR > 75th pctl) - - WD (warm/dry days): (TG > 75th pctl) and (RR < 25th pctl) - - WW (warm/wet days): (TG > 75th pctl) and (RR > 75th pctl) - -.. note:: RR is a daily precipitation on a *wet* day, and its percentile value is computed from set of wet days also. - -.. note:: Percentiles thresholds computing uses differents methods as it was described :ref:`here `. - - -.. code-block:: python - - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": ["lt", "gt"], - "thresh": ["p25", "p75"], - "var_type": ["t", "p"], - "link_logical_operations": "and", - } - - file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - file_tas = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - - bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] - icclim.index( - user_index=my_index_params, - in_files=[file_tas, file_pr], - var_name=["tas", "pr"], - slice_mode="year", - out_unit="days", - base_period_time_range=bp, - out_file=out_f, - callback=callback.defaultCallback2, - ) - -Number of days when tasmax > 90th pctl and tasmin >= 10 and precipitation < 30th pctl -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. note:: If 'calc_operation' is *'max_nb_consecutive_events'*, then max number of consecutive days for the same condition will be computed. - -.. code-block:: python - - my_index_params = { - "index_name": "my_index", - "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' - "logical_operation": ["gt", "get", "lt"], - "thresh": ["p90", 10 + 273.15, "p30"], - "var_type": ["t", "-", "p"], - "link_logical_operations": "and", - } - file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - out_f = "my_index.nc" - - bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] - icclim.index( - user_index=my_index_params, - in_files=[file_tasmax, file_tasmin, file_pr], - var_name=["tasmax", "tasmin", "pr"], - slice_mode="SON", - out_unit="days", - base_period_time_range=bp, - out_file=out_f, - callback=callback.defaultCallback2, - ) + If 'calc_operation' is *'max_nb_consecutive_events'*, then max number + of consecutive days for the same condition will be computed. + +.. code:: python + + my_index_params = { + "index_name": "my_index", + "calc_operation": "nb_events", ### 'calc_operation': 'max_nb_consecutive_events' + "logical_operation": ["gt", "get", "lt"], + "thresh": ["p90", 10 + 273.15, "p30"], + "var_type": ["t", "-", "p"], + "link_logical_operations": "and", + } + file_pr = "pr_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + file_tasmax = "tasmax_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + file_tasmin = "tasmin_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + out_f = "my_index.nc" + + bp = [datetime.datetime(1960, 1, 1), datetime.datetime(1969, 12, 31)] + icclim.index( + user_index=my_index_params, + in_files=[file_tasmax, file_tasmin, file_pr], + var_name=["tasmax", "tasmin", "pr"], + slice_mode="SON", + out_unit="days", + base_period_time_range=bp, + out_file=out_f, + callback=callback.defaultCallback2, + ) diff --git a/doc/source/how_to/recipes_ecad.rst b/doc/source/how_to/recipes_ecad.rst index adde9cf0..9516373a 100644 --- a/doc/source/how_to/recipes_ecad.rst +++ b/doc/source/how_to/recipes_ecad.rst @@ -1,243 +1,248 @@ -Examples ---------- - -.. code-block:: python - - import icclim - import datetime - -Example 1: index SU -~~~~~~~~~~~~~~~~~~~ - -.. code-block:: python - - files = [ - "tasmax_day_CNRM-CM5_historical_r1i1p1_19950101-19991231.nc", - "tasmax_day_CNRM-CM5_historical_r1i1p1_20000101-20041231.nc", - "tasmax_day_CNRM-CM5_historical_r1i1p1_20050101-20051231.nc", - ] - - dt1 = datetime.datetime(1998, 1, 1) - dt2 = datetime.datetime(2005, 12, 31) - - out_f = "SU_JJA_CNRM-CM5_historical_r1i1p1_1998-2005.nc" # OUTPUT FILE: summer season values of SU - - icclim.index( - index_name="SU", - in_files=files, - var_name="tasmax", - time_range=[dt1, dt2], - slice_mode="JJA", - out_file=out_f, - ) - - dt1 = datetime.datetime(1998, 1, 1) - dt2 = datetime.datetime(2005, 12, 31) - - out_f = "SU_JJA_CNRM-CM5_historical_r1i1p1_1998-2005.nc" # OUTPUT FILE: summer season values of SU - - icclim.index( - index_name="SU", - in_files=files, - var_name="tasmax", - time_range=[dt1, dt2], - slice_mode="JJA", - out_file=out_f, - ) - - -Example 2: index ETR -~~~~~~~~~~~~~~~~~~~~ - -.. code-block:: python - - files_tasmax = [ - "tasmax_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc", - "tasmax_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc", - ] - files_tasmin = [ - "tasmin_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc", - "tasmin_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc", - ] - - out_f = "ETR_year_CNRM-CM5_historical_r1i1p1_1930-1939.nc" # OUTPUT FILE: annual values of ETR - - icclim.index( - index_name="ETR", - in_files=[files_tasmax, files_tasmin], - var_name=["tasmax", "tasmin"], - slice_mode="year", - out_file=out_f, - ) - files_tasmin = [ - "tasmin_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc", - "tasmin_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc", - ] - - out_f = "ETR_year_CNRM-CM5_historical_r1i1p1_1930-1939.nc" # OUTPUT FILE: annual values of ETR - - icclim.index( - index_name="ETR", - in_files=[files_tasmax, files_tasmin], - var_name=["tasmax", "tasmin"], - slice_mode="year", - out_file=out_f, - ) - -.. warning:: The order of `var_name` must be ['tasmax', 'tasmin'] and NOT ['tasmin', 'tasmax']. The same for `in_files`. - - -Example 3: index TG90p with callback -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. code-block:: python - - f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - - # base period - base_dt1 = datetime.datetime(1961, 1, 1) - base_dt2 = datetime.datetime(1970, 12, 31) - - # studied period - dt1 = datetime.datetime(1980, 1, 1) - dt2 = datetime.datetime(2000, 12, 31) - - out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p - - icclim.index( - index_name="TG90p", - in_files=f, - var_name="tas", - slice_mode="AMJJAS", - time_range=[dt1, dt2], - base_period_time_range=[base_dt1, base_dt2], - out_file=out_f, - out_unit="%", - ) - - f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - - # base period - base_dt1 = datetime.datetime(1961, 1, 1) - base_dt2 = datetime.datetime(1970, 12, 31) - - # studied period - dt1 = datetime.datetime(1980, 1, 1) - dt2 = datetime.datetime(2000, 12, 31) - - out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p - - icclim.index( - index_name="TG90p", - in_files=f, - var_name="tas", - slice_mode="AMJJAS", - time_range=[dt1, dt2], - base_period_time_range=[base_dt1, base_dt2], - out_file=out_f, - out_unit="%", - callback=callback.defaultCallback2, - ) - - - -Example 4: multivariable indices CD, CW, WD, WW -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. code-block:: python - - f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - - # base period - base_dt1 = datetime.datetime(1961, 1, 1) - base_dt2 = datetime.datetime(1970, 12, 31) - - # studied period - dt1 = datetime.datetime(1980, 1, 1) - dt2 = datetime.datetime(2000, 12, 31) - - out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p - - icclim.index( - index_name="TG90p", - in_files=f, - var_name="tas", - slice_mode="AMJJAS", - time_range=[dt1, dt2], - base_period_time_range=[base_dt1, base_dt2], - out_file=out_f, - out_unit="%", - ) - - f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" - - # base period - base_dt1 = datetime.datetime(1961, 1, 1) - base_dt2 = datetime.datetime(1970, 12, 31) - - # studied period - dt1 = datetime.datetime(1980, 1, 1) - dt2 = datetime.datetime(2000, 12, 31) - - out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p - - icclim.index( - index_name="TG90p", - in_files=f, - var_name="tas", - slice_mode="AMJJAS", - time_range=[dt1, dt2], - base_period_time_range=[base_dt1, base_dt2], - out_file=out_f, - out_unit="%", - callback=callback.defaultCallback2, - ) - - -Multi index computation -~~~~~~~~~~~~~~~~~~~~~~~~ +########## + Examples +########## + +.. code:: python + + import icclim + import datetime + +********************* + Example 1: index SU +********************* + +.. code:: python + + files = [ + "tasmax_day_CNRM-CM5_historical_r1i1p1_19950101-19991231.nc", + "tasmax_day_CNRM-CM5_historical_r1i1p1_20000101-20041231.nc", + "tasmax_day_CNRM-CM5_historical_r1i1p1_20050101-20051231.nc", + ] + + dt1 = datetime.datetime(1998, 1, 1) + dt2 = datetime.datetime(2005, 12, 31) + + out_f = "SU_JJA_CNRM-CM5_historical_r1i1p1_1998-2005.nc" # OUTPUT FILE: summer season values of SU + + icclim.index( + index_name="SU", + in_files=files, + var_name="tasmax", + time_range=[dt1, dt2], + slice_mode="JJA", + out_file=out_f, + ) + + dt1 = datetime.datetime(1998, 1, 1) + dt2 = datetime.datetime(2005, 12, 31) + + out_f = "SU_JJA_CNRM-CM5_historical_r1i1p1_1998-2005.nc" # OUTPUT FILE: summer season values of SU + + icclim.index( + index_name="SU", + in_files=files, + var_name="tasmax", + time_range=[dt1, dt2], + slice_mode="JJA", + out_file=out_f, + ) + +********************** + Example 2: index ETR +********************** + +.. code:: python + + files_tasmax = [ + "tasmax_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc", + "tasmax_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc", + ] + files_tasmin = [ + "tasmin_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc", + "tasmin_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc", + ] + + out_f = "ETR_year_CNRM-CM5_historical_r1i1p1_1930-1939.nc" # OUTPUT FILE: annual values of ETR + + icclim.index( + index_name="ETR", + in_files=[files_tasmax, files_tasmin], + var_name=["tasmax", "tasmin"], + slice_mode="year", + out_file=out_f, + ) + files_tasmin = [ + "tasmin_day_CNRM-CM5_historical_r1i1p1_19300101-19341231.nc", + "tasmin_day_CNRM-CM5_historical_r1i1p1_19350101-19391231.nc", + ] + + out_f = "ETR_year_CNRM-CM5_historical_r1i1p1_1930-1939.nc" # OUTPUT FILE: annual values of ETR + + icclim.index( + index_name="ETR", + in_files=[files_tasmax, files_tasmin], + var_name=["tasmax", "tasmin"], + slice_mode="year", + out_file=out_f, + ) + +.. warning:: + + The order of `var_name` must be ['tasmax', 'tasmin'] and NOT + ['tasmin', 'tasmax']. The same for `in_files`. + +************************************** + Example 3: index TG90p with callback +************************************** + +.. code:: python + + f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + + # base period + base_dt1 = datetime.datetime(1961, 1, 1) + base_dt2 = datetime.datetime(1970, 12, 31) + + # studied period + dt1 = datetime.datetime(1980, 1, 1) + dt2 = datetime.datetime(2000, 12, 31) + + out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p + + icclim.index( + index_name="TG90p", + in_files=f, + var_name="tas", + slice_mode="AMJJAS", + time_range=[dt1, dt2], + base_period_time_range=[base_dt1, base_dt2], + out_file=out_f, + out_unit="%", + ) + + f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + + # base period + base_dt1 = datetime.datetime(1961, 1, 1) + base_dt2 = datetime.datetime(1970, 12, 31) + + # studied period + dt1 = datetime.datetime(1980, 1, 1) + dt2 = datetime.datetime(2000, 12, 31) + + out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p + + icclim.index( + index_name="TG90p", + in_files=f, + var_name="tas", + slice_mode="AMJJAS", + time_range=[dt1, dt2], + base_period_time_range=[base_dt1, base_dt2], + out_file=out_f, + out_unit="%", + callback=callback.defaultCallback2, + ) + +************************************************* + Example 4: multivariable indices CD, CW, WD, WW +************************************************* + +.. code:: python + + f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + + # base period + base_dt1 = datetime.datetime(1961, 1, 1) + base_dt2 = datetime.datetime(1970, 12, 31) + + # studied period + dt1 = datetime.datetime(1980, 1, 1) + dt2 = datetime.datetime(2000, 12, 31) + + out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p + + icclim.index( + index_name="TG90p", + in_files=f, + var_name="tas", + slice_mode="AMJJAS", + time_range=[dt1, dt2], + base_period_time_range=[base_dt1, base_dt2], + out_file=out_f, + out_unit="%", + ) + + f = "tas_day_CNRM-CM5_historical_r1i1p1_19010101-20001231.nc" + + # base period + base_dt1 = datetime.datetime(1961, 1, 1) + base_dt2 = datetime.datetime(1970, 12, 31) + + # studied period + dt1 = datetime.datetime(1980, 1, 1) + dt2 = datetime.datetime(2000, 12, 31) + + out_f = "TG90p_AMJJAS_CNRM-CM5_historical_r1i1p1_1980-2000.nc" # OUTPUT FILE: summer half-year values of TG90p + + icclim.index( + index_name="TG90p", + in_files=f, + var_name="tas", + slice_mode="AMJJAS", + time_range=[dt1, dt2], + base_period_time_range=[base_dt1, base_dt2], + out_file=out_f, + out_unit="%", + callback=callback.defaultCallback2, + ) + +************************* + Multi index computation +************************* *New in 5.1.0.* This feature allows you to compute multiple indices at the same time. -This is just a shorthand to avoid writing your own for loop, there is no specific optimization done to group together -similar operation. +This is just a shorthand to avoid writing your own for loop, there is no +specific optimization done to group together similar operation. .. note:: - The input ``in_files`` must include all the necessary variables to compute the indices. - You can bypass this by setting `ignore_error=True`. - In that case when icclim fails to compute an index it will simply be omitted in the result. - -Compute every HEAT indices [SU, TR, WSDI, TG90p, TN90p, TX90p, TXx, TNx, CSU] -_____________________________________________________________________________ + The input ``in_files`` must include all the necessary variables to + compute the indices. You can bypass this by setting + `ignore_error=True`. In that case when icclim fails to compute an + index it will simply be omitted in the result. -.. code-block:: python - - bp = [datetime.datetime(1991, 1, 1), datetime.datetime(1999, 12, 31)] - tr = [datetime.datetime(1991, 1, 1), datetime.datetime(2010, 12, 31)] - # The file must include all necessary variable for HEAT indices i - file = "./netcdf_files/sampledata.1991-2010.nc" - res = icclim.indices( - index_group=IndexGroup.HEAT, - in_files=file, - base_period_time_range=bp, - time_range=tr, - out_file="heat_indices.nc", - ) +Compute every HEAT indices [SU, TR, WSDI, TG90p, TN90p, TX90p, TXx, TNx, CSU] +============================================================================= + +.. code:: python + + bp = [datetime.datetime(1991, 1, 1), datetime.datetime(1999, 12, 31)] + tr = [datetime.datetime(1991, 1, 1), datetime.datetime(2010, 12, 31)] + # The file must include all necessary variable for HEAT indices i + file = "./netcdf_files/sampledata.1991-2010.nc" + res = icclim.indices( + index_group=IndexGroup.HEAT, + in_files=file, + base_period_time_range=bp, + time_range=tr, + out_file="heat_indices.nc", + ) Compute every indices -_____________________ - -.. code-block:: python - - bp = [datetime.datetime(1991, 1, 1), datetime.datetime(1999, 12, 31)] - tr = [datetime.datetime(1991, 1, 1), datetime.datetime(2010, 12, 31)] - file = "./netcdf_files/sampledata.1991-2010.nc" - res = icclim.indices( - index_group="all", - in_files=file, - base_period_time_range=bp, - time_range=tr, - out_file="heat_indices.nc", - ) +===================== + +.. code:: python + + bp = [datetime.datetime(1991, 1, 1), datetime.datetime(1999, 12, 31)] + tr = [datetime.datetime(1991, 1, 1), datetime.datetime(2010, 12, 31)] + file = "./netcdf_files/sampledata.1991-2010.nc" + res = icclim.indices( + index_group="all", + in_files=file, + base_period_time_range=bp, + time_range=tr, + out_file="heat_indices.nc", + ) diff --git a/doc/source/how_to/recipes_generic.rst b/doc/source/how_to/recipes_generic.rst index 0cd3dfe8..e24d39c1 100644 --- a/doc/source/how_to/recipes_generic.rst +++ b/doc/source/how_to/recipes_generic.rst @@ -1,363 +1,376 @@ -.. _`generic_indices_recipes`: +.. _generic_indices_recipes: -Generic indices recipes ------------------------ +######################### + Generic indices recipes +######################### -You will find below a few example of icclim v6 :ref:`generic_functions_api`. +You will find below a few example of icclim v6 +:ref:`generic_functions_api`. -.. code-block:: python +.. code:: python - import icclim - from icclim import build_threshold + import icclim + from icclim import build_threshold - # Change `data` to your own netcdf file path. - data = "netcdf_files/gridded.1991-2010.nc" + # Change `data` to your own netcdf file path. + data = "netcdf_files/gridded.1991-2010.nc" +******************* + Count occurrences +******************* -Count occurrences -+++++++++++++++++ +Occurrences of events where tas is between 20 and 30 degree Celsius and +precipitation are above 3 mm/day. -Occurrences of events where tas is between 20 and 30 degree Celsius and precipitation are above 3 mm/day. +.. code:: python -.. code-block:: python + # Equivalent to using `reasonable_temp = "<= 30 deg_C AND >= 20 deg_C"` + reasonable_temp = build_threshold("<= 30 deg_C") & build_threshold(">= 20 deg_C") + some_rain = icclim.build_threshold("> 3 mm/day") - # Equivalent to using `reasonable_temp = "<= 30 deg_C AND >= 20 deg_C"` - reasonable_temp = build_threshold("<= 30 deg_C") & build_threshold(">= 20 deg_C") - some_rain = icclim.build_threshold("> 3 mm/day") + dataset = icclim.count_occurrences( + in_files={ + "tmax": {"study": data, "thresholds": reasonable_temp}, + "precip": {"study": data, "thresholds": some_rain}, + } + ) + # .compute must be called to ask dask to run the actual computation + conputed_data = dataset.count_occurrences.compute() - dataset = icclim.count_occurrences( - in_files={ - "tmax": {"study": data, "thresholds": reasonable_temp}, - "precip": {"study": data, "thresholds": some_rain}, - } - ) - # .compute must be called to ask dask to run the actual computation - conputed_data = dataset.count_occurrences.compute() +.. code:: python -.. code-block:: python + tx99p_dataset = icclim.count_occurrences( + in_files=data, var_name="tasmax", threshold=">= 99 doy_per" + ) + # .compute must be called to ask dask to run the actual computation + tx99p = dataset.count_occurrences.compute() - tx99p_dataset = icclim.count_occurrences( - in_files=data, var_name="tasmax", threshold=">= 99 doy_per" - ) - # .compute must be called to ask dask to run the actual computation - tx99p = dataset.count_occurrences.compute() - -Sum -+++ +***** + Sum +***** Sum of precipitation that are above 4 mm/day. -.. code-block:: python - - rain_sum_above_4mm = icclim.sum( - in_files=data, var_name="precip", threshold="> 4 mm/day" - ).sum.compute() +.. code:: python + rain_sum_above_4mm = icclim.sum( + in_files=data, var_name="precip", threshold="> 4 mm/day" + ).sum.compute() -Standard Deviation -++++++++++++++++++ +******************** + Standard Deviation +******************** Standard deviation of ``tas`` variable. -.. code-block:: python - - tas_std = icclim.std(in_files=data, var_name="tas").std.compute() +.. code:: python + tas_std = icclim.std(in_files=data, var_name="tas").std.compute() -Average -+++++++ +********* + Average +********* Average of the ``tas`` variable, per year by default. -.. code-block:: python - - tas_average = icclim.average(in_files=data, var_name="tas").average.compute() - +.. code:: python -Average of the ``tas`` values that are above the 87th period percentile (computed on the whole period here), -per year by default. + tas_average = icclim.average(in_files=data, var_name="tas").average.compute() -.. code-block:: python +Average of the ``tas`` values that are above the 87th period percentile +(computed on the whole period here), per year by default. - tas_average_above_percentile_of_period = icclim.average( - in_files=data, var_name="tas", threshold="> 87 period_per" - ).average.compute() +.. code:: python + tas_average_above_percentile_of_period = icclim.average( + in_files=data, var_name="tas", threshold="> 87 period_per" + ).average.compute() -Maximum Consecutive Occurrences -+++++++++++++++++++++++++++++++ +********************************* + Maximum Consecutive Occurrences +********************************* -Almost equivalent to ECAD's index CDD (Consecutive Dry Days, days when pr is below 1 mm/day). +Almost equivalent to ECAD's index CDD (Consecutive Dry Days, days when +pr is below 1 mm/day). -.. code-block:: python +.. code:: python - CDD = icclim.max_consecutive_occurrence( - in_files=data, var_name="precip", threshold="< 1.3 mm/day" - ).max_consecutive_occurrence.compute() + CDD = icclim.max_consecutive_occurrence( + in_files=data, var_name="precip", threshold="< 1.3 mm/day" + ).max_consecutive_occurrence.compute() - -Sum of Spell Lengths -++++++++++++++++++++ +********************** + Sum of Spell Lengths +********************** Almost equivalent to ECAD's index WSDI (Warm Spell Duration Index, maximum consecutive occurrence of tasmax > 90th doy percentile) -.. code-block:: python - - custom_wsdi = icclim.sum_of_spell_lengths( - in_files=data, var_name="precip", threshold="> 90 doy_per AND > 28 degC" - ).sum_of_spell_lengths.compute() +.. code:: python -Excess -++++++ + custom_wsdi = icclim.sum_of_spell_lengths( + in_files=data, var_name="precip", threshold="> 90 doy_per AND > 28 degC" + ).sum_of_spell_lengths.compute() -Excess of minimal daily temperature above the 22 daily percentile threshold computed overs the 1991-1995 reference -period, with a focus on the June to August periods. +******** + Excess +******** -.. code-block:: python +Excess of minimal daily temperature above the 22 daily percentile +threshold computed overs the 1991-1995 reference period, with a focus on +the June to August periods. - jja_tmin_excess = ( - icclim.excess( - climp_file, - var_name=["tmin"], - threshold=icclim.build_threshold( - "22 doy_per", base_period_time_range=["1991-01-01", "1995-12-31"] - ), - slice_mode="jja", - ) - .compute() - .excess - ) +.. code:: python + jja_tmin_excess = ( + icclim.excess( + climp_file, + var_name=["tmin"], + threshold=icclim.build_threshold( + "22 doy_per", base_period_time_range=["1991-01-01", "1995-12-31"] + ), + slice_mode="jja", + ) + .compute() + .excess + ) -Deficit -+++++++ +********* + Deficit +********* Deficit of minimal daily temperature below 17 degree Celsius. -.. code-block:: python +.. code:: python - result13 = icclim.index( - climp_file, - var_name=["tmin"], - index_name="deficit", - threshold=build_threshold("17 degC"), - ).compute() + result13 = icclim.index( + climp_file, + var_name=["tmin"], + index_name="deficit", + threshold=build_threshold("17 degC"), + ).compute() -Fraction of Total -+++++++++++++++++ +******************* + Fraction of Total +******************* -Fraction of precipitations above the 75th period percentile, where percentiles are computed only on values above 1 mm/day -This is equivalent to the ECAD's index R75pTOT. +Fraction of precipitations above the 75th period percentile, where +percentiles are computed only on values above 1 mm/day This is +equivalent to the ECAD's index R75pTOT. -.. code-block:: python +.. code:: python - result14 = ( - icclim.fraction_of_total( - climp_file, - var_name=["precip"], - threshold=build_threshold( - "> 75 period_per", threshold_min_value="1 mm/day" - ), - ) - .compute() - .fraction_of_total - ) + result14 = ( + icclim.fraction_of_total( + climp_file, + var_name=["precip"], + threshold=build_threshold("> 75 period_per", threshold_min_value="1 mm/day"), + ) + .compute() + .fraction_of_total + ) -Maximum -+++++++ +********* + Maximum +********* Maximum of tas temperature per month. -.. code-block:: python +.. code:: python - max_of_tas = ( - icclim.maximum( - climp_file, - var_name=["tas"], - slice_mode="month", - ) - .compute() - .maximum - ) + max_of_tas = ( + icclim.maximum( + climp_file, + var_name=["tas"], + slice_mode="month", + ) + .compute() + .maximum + ) -Minimum -+++++++ +********* + Minimum +********* Minimum of tas temperature per month. -.. code-block:: python - - min_of_tas = ( - icclim.minimum( - climp_file, - var_name=["tas"], - slice_mode="month", - ) - .compute() - .minimum - ) - - -Max of Rolling Sum -++++++++++++++++++ - -Maximum of rolling sum of precipitation that are above the period median, where the median is computed for the whole -period (default behavior when there is no `base_period_time_range`) only on values above 1mm/day. - -.. code-block:: python - - max_of_rolling_sum = ( - icclim.index( - climp_file, - index_name="max_of_rolling_sum", - var_name=["precip"], - threshold=build_threshold( - ">= 50 period_per", threshold_min_value="1 mmday" - ), - ) - .compute() - .max_of_rolling_sum - ) - -Min of Rolling Sum -++++++++++++++++++ - -Minimum of rolling sum of precipitation that are above the period median, where the median is computed for the whole -period (default behavior when there is no `base_period_time_range`) only on values above 1mm/day. - -.. code-block:: python - - min_of_rolling_sum = ( - icclim.min_of_rolling_sum( - climp_file, - var_name=["precip"], - threshold=build_threshold( - ">= 50 period_per", threshold_min_value="1 mmday" - ), - ) - .compute() - .min_of_rolling_sum - ) - -Max of Rolling Average -++++++++++++++++++++++ - -Maximum of rolling average of precipitation that are above the period median, where the median is computed for the whole -period (default behavior when there is no `base_period_time_range`) only on values above 1mm/day. - -.. code-block:: python - - max_of_rolling_average = ( - icclim.index( - climp_file, - index_name="max_of_rolling_average", - var_name=["precip"], - threshold=build_threshold( - ">= 50 period_per", threshold_min_value="1 mmday" - ), - ) - .compute() - .max_of_rolling_average - ) - -Min of Rolling Average -++++++++++++++++++++++ - -Minimum of rolling average of precipitation that are above the period median, where the median is computed for the whole -period (default behavior when there is no `base_period_time_range`) only on values above 1mm/day. - -.. code-block:: python - - min_of_rolling_average = ( - icclim.min_of_rolling_average( - climp_file, - var_name=["precip"], - threshold=build_threshold( - ">= 50 period_per", threshold_min_value="1 mmday" - ), - ) - .compute() - .min_of_rolling_average - ) - -Mean of difference -++++++++++++++++++ - -Mean of the difference between tasmax in tasmin. -It's a generification of ECAD's index DTR. - -.. code-block:: python - - dtr = ( - icclim.index( - climp_file, - index_name="mean_of_difference", - var_name=["tmax", "tmin"], - ) - .compute() - .mean_of_difference - ) - -Difference of extremes -++++++++++++++++++++++ - -Difference of the maximum of tasmax and the minimum of tasmin. -It's a generification of ECAD's index ETR. - -.. code-block:: python - - dtr = ( - icclim.index( - climp_file, - index_name="difference_of_extremes", - var_name=["tmax", "tmin"], - ) - .compute() - .difference_of_extremes - ) - -Difference of means -+++++++++++++++++++ - -Difference between averaged tas and the averaged tas values of the reference period. -Also known as the ``anomaly``. - - -.. code-block:: python - - anomaly = ( - icclim.difference_of_means( - climp_file, - var_name=["tas"], - base_period_time_range=["1991-01-01", "1995-12-31"], - ) - .compute() - .difference_of_means - ) - - -Mean Of Absolute One Time Step Difference -+++++++++++++++++++++++++++++++++++++++++ - -Mean of absolute difference between tasmax and tasmin with a one time step lag (usually 1 day). -This is equivalent to the pseudo-code: - -.. code-block:: python - - a = tasmax[T + 1] - tasmin[T + 1] - b = tasmax[T] - tasmin[T] - average(a - b) +.. code:: python + + min_of_tas = ( + icclim.minimum( + climp_file, + var_name=["tas"], + slice_mode="month", + ) + .compute() + .minimum + ) + +******************** + Max of Rolling Sum +******************** + +Maximum of rolling sum of precipitation that are above the period +median, where the median is computed for the whole period (default +behavior when there is no `base_period_time_range`) only on values above +1mm/day. + +.. code:: python + + max_of_rolling_sum = ( + icclim.index( + climp_file, + index_name="max_of_rolling_sum", + var_name=["precip"], + threshold=build_threshold(">= 50 period_per", threshold_min_value="1 mmday"), + ) + .compute() + .max_of_rolling_sum + ) + +******************** + Min of Rolling Sum +******************** + +Minimum of rolling sum of precipitation that are above the period +median, where the median is computed for the whole period (default +behavior when there is no `base_period_time_range`) only on values above +1mm/day. + +.. code:: python + + min_of_rolling_sum = ( + icclim.min_of_rolling_sum( + climp_file, + var_name=["precip"], + threshold=build_threshold(">= 50 period_per", threshold_min_value="1 mmday"), + ) + .compute() + .min_of_rolling_sum + ) + +************************ + Max of Rolling Average +************************ + +Maximum of rolling average of precipitation that are above the period +median, where the median is computed for the whole period (default +behavior when there is no `base_period_time_range`) only on values above +1mm/day. + +.. code:: python + + max_of_rolling_average = ( + icclim.index( + climp_file, + index_name="max_of_rolling_average", + var_name=["precip"], + threshold=build_threshold(">= 50 period_per", threshold_min_value="1 mmday"), + ) + .compute() + .max_of_rolling_average + ) + +************************ + Min of Rolling Average +************************ + +Minimum of rolling average of precipitation that are above the period +median, where the median is computed for the whole period (default +behavior when there is no `base_period_time_range`) only on values above +1mm/day. + +.. code:: python + + min_of_rolling_average = ( + icclim.min_of_rolling_average( + climp_file, + var_name=["precip"], + threshold=build_threshold(">= 50 period_per", threshold_min_value="1 mmday"), + ) + .compute() + .min_of_rolling_average + ) + +******************** + Mean of difference +******************** + +Mean of the difference between tasmax in tasmin. It's a generification +of ECAD's index DTR. + +.. code:: python + + dtr = ( + icclim.index( + climp_file, + index_name="mean_of_difference", + var_name=["tmax", "tmin"], + ) + .compute() + .mean_of_difference + ) + +************************ + Difference of extremes +************************ + +Difference of the maximum of tasmax and the minimum of tasmin. It's a +generification of ECAD's index ETR. + +.. code:: python + + dtr = ( + icclim.index( + climp_file, + index_name="difference_of_extremes", + var_name=["tmax", "tmin"], + ) + .compute() + .difference_of_extremes + ) + +********************* + Difference of means +********************* + +Difference between averaged tas and the averaged tas values of the +reference period. Also known as the ``anomaly``. + +.. code:: python + + anomaly = ( + icclim.difference_of_means( + climp_file, + var_name=["tas"], + base_period_time_range=["1991-01-01", "1995-12-31"], + ) + .compute() + .difference_of_means + ) + +******************************************* + Mean Of Absolute One Time Step Difference +******************************************* + +Mean of absolute difference between tasmax and tasmin with a one time +step lag (usually 1 day). This is equivalent to the pseudo-code: + +.. code:: python + + a = tasmax[T + 1] - tasmin[T + 1] + b = tasmax[T] - tasmin[T] + average(a - b) It's a generification of ECAD's index vDTR. -.. code-block:: python +.. code:: python - result = ( - icclim.mean_of_absolute_one_time_step_difference( - climp_file, - var_name=["tmax", "tmin"], - ) - .compute() - .mean_of_absolute_one_time_step_difference - ) + result = ( + icclim.mean_of_absolute_one_time_step_difference( + climp_file, + var_name=["tmax", "tmin"], + ) + .compute() + .mean_of_absolute_one_time_step_difference + ) diff --git a/doc/source/index.rst b/doc/source/index.rst index 13f29d97..b206f772 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -1,24 +1,32 @@ -.. icclim documentation master file, created by +.. + icclim documentation master file, created by sphinx-quickstart on Tue Dec 14 14:42:00 2021. +.. _dask: https://dask.org/ + .. _diataxis: https://diataxis.fr/ -.. _xclim: https://xclim.readthedocs.io/en/stable/ + +.. _netcdf: http://www.unidata.ucar.edu/software/netcdf + +.. _numpy: http://www.numpy.org + .. _xarray: https://xarray.pydata.org/en/stable/ -.. _dask: https://dask.org/ -.. _NumPy: http://www.numpy.org -.. _netCDF: http://www.unidata.ucar.edu/software/netcdf -icclim documentation -==================== +.. _xclim: https://xclim.readthedocs.io/en/stable/ + +###################### + icclim documentation +###################### -Welcome to icclim documentation. -icclim (Index Calculation for CLIMate) is a Python library to compute climate indices. -It is built on a open source stack made of `xclim`_, `xarray`_, `dask`_ and of course `NumPy`_. +Welcome to icclim documentation. icclim (Index Calculation for CLIMate) +is a Python library to compute climate indices. It is built on a open +source stack made of xclim_, xarray_, dask_ and of course NumPy_. .. note:: - icclim documentation is currently under construction. We try to follow the `diataxis`_ principles to build a - comprehensive user focused documentation. + icclim documentation is currently under construction. We try to + follow the diataxis_ principles to build a comprehensive user focused + documentation. .. toctree:: :maxdepth: 2 @@ -30,30 +38,46 @@ It is built on a open source stack made of `xclim`_, `xarray`_, `dask`_ and of c explanation/index dev/index +************************** + A few notes about icclim +************************** -A few notes about icclim ------------------------- -1. Input datasets must be compliant to the `CF convention `_. -2. Currently, *icclim* doesn't support spatial subsetting, i.e. it processes whole spatial area. -3. *icclim* works with unsecured OPeNDAP datasets as well. -4. icclim developer repository can be found here: ``_ +#. Input datasets must be compliant to the `CF convention + `_. +#. Currently, *icclim* doesn't support spatial subsetting, i.e. it + processes whole spatial area. +#. *icclim* works with unsecured OPeNDAP datasets as well. +#. icclim developer repository can be found here: + https://github.com/cerfacs-globc/icclim -Contacts --------- -Add If you encounter a bug or an issue while using icclim, don't hesitate to open a ticket on our `github `_. +********** + Contacts +********** + +Add If you encounter a bug or an issue while using icclim, don't +hesitate to open a ticket on our `github +`_. Maintainers -~~~~~~~~~~~ -- Christian Page, `@pagecp `_ -- Abel Aoun, `@bzah `_ +=========== + +- Christian Page, `@pagecp `_ + +- Abel Aoun, `@bzah `_ + + +******** + Grants +******** -Grants ------- -This open-source project has been possible thanks to funding by the European Commission projects: +This open-source project has been possible thanks to funding by the +European Commission projects: -* FP7-CLIPC (2013-2016) -* FP7-IS-ENES2 (2013-2017) -* EUDAT2020 (2015-2018) -* H2020-IS-ENES3 (2019-2023) +- FP7-CLIPC (2013-2016) +- FP7-IS-ENES2 (2013-2017) +- EUDAT2020 (2015-2018) +- H2020-IS-ENES3 (2019-2023) -The beautiful icclim logo is a creation of `Carole Petetin `_ and has been funded by the H2020 `IS-ENES3 `_ project grant agreement No 824084 (2019-2023). +The beautiful icclim logo is a creation of `Carole Petetin +`_ and has been funded by the H2020 `IS-ENES3 +`_ project grant agreement No 824084 (2019-2023). diff --git a/doc/source/references/custom_indices.rst b/doc/source/references/custom_indices.rst index 71dd6fae..3d91479f 100644 --- a/doc/source/references/custom_indices.rst +++ b/doc/source/references/custom_indices.rst @@ -1,140 +1,178 @@ -.. _`custom_indices`: - -Create your own index with ``user_index`` ------------------------------------------ -You can calculate custom climate indices by using the ``user_index`` parameters. -It is a configuration dictionary to describe how the index should be computed. -In icclim documentation we usually call them custom indices or user indices. - -.. code-block:: python - - user_index_dict = dict( - index_name="a_custom_csdi", - calc_operation="max_nb_consecutive_events", - logical_operation="<", - thresh="5p", - window_width=5, - ) - refer_period = [datetime.datetime(1991, 1, 1), datetime.datetime(1999, 12, 31)] - study_period = [datetime.datetime(1991, 1, 1), datetime.datetime(2010, 12, 31)] - result = icclim.custom_index( - in_files="netcdf_files/tasmin.nc", - user_index=user_index_dict, - var_name="tmin", - slice_mode="YS", - base_period_time_range=refer_period, - time_range=study_period, - out_file="custom_csdi_5.nc", - ) - - - -``user_index`` dictionary -~~~~~~~~~~~~~~~~~~~~~~~~~ +.. _custom_indices: + +########################################### + Create your own index with ``user_index`` +########################################### + +You can calculate custom climate indices by using the ``user_index`` +parameters. It is a configuration dictionary to describe how the index +should be computed. In icclim documentation we usually call them custom +indices or user indices. + +.. code:: python + + user_index_dict = dict( + index_name="a_custom_csdi", + calc_operation="max_nb_consecutive_events", + logical_operation="<", + thresh="5p", + window_width=5, + ) + refer_period = [datetime.datetime(1991, 1, 1), datetime.datetime(1999, 12, 31)] + study_period = [datetime.datetime(1991, 1, 1), datetime.datetime(2010, 12, 31)] + result = icclim.custom_index( + in_files="netcdf_files/tasmin.nc", + user_index=user_index_dict, + var_name="tmin", + slice_mode="YS", + base_period_time_range=refer_period, + time_range=study_period, + out_file="custom_csdi_5.nc", + ) + +*************************** + ``user_index`` dictionary +*************************** + ``user_index`` is a dictionary with possible keys: -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|Key |Type of value |Description | -+========================+===========================================+======================================================================================+ -|index_name |*str* |Name of custom index. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|calc_operation |*str* |Type of calculation. See below for more details. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|logical_operation |*str* |gt, lt, get, let or e | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|thresh |*float* or *str* |In case of percentile-based index, must be string which starts with "p" (e.g. "p90"). | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|link_logical_operations |*str* |and or or | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|extreme_mode |*str* |min or max for computing min or max of running mean/sum. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|window_width |*int* |Used for computing running mean/sum. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|coef |*float* |Constant for multiplying input data array. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|date_event |*bool* |To keep or not the date of event. See below for more details. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|var_type |*str* |"t" or "p". See below for more details. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ -|ref_time_range |[*datetime.datetime*, *datetime.datetime*] |Time range of reference (baseline) period for computing anomalies. | -+------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| Key | Type of value | Description | ++=========================+===========================================+======================================================================================+ +| index_name | *str* | Name of custom index. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| calc_operation | *str* | Type of calculation. See below for more details. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| logical_operation | *str* | gt, lt, get, let or e | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| thresh | *float* or *str* | In case of percentile-based index, must be string which starts with "p" (e.g. | +| | | "p90"). | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| link_logical_operations | *str* | and or or | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| extreme_mode | *str* | min or max for computing min or max of running mean/sum. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| window_width | *int* | Used for computing running mean/sum. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| coef | *float* | Constant for multiplying input data array. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| date_event | *bool* | To keep or not the date of event. See below for more details. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| var_type | *str* | "t" or "p". See below for more details. | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ +| ref_time_range | [*datetime.datetime*, | Time range of reference (baseline) period for computing anomalies. | +| | *datetime.datetime*] | | ++-------------------------+-------------------------------------------+--------------------------------------------------------------------------------------+ Additional information about ``user_index`` keys are given below. - calc_operation key -++++++++++++++++++ - -======================================= =========================================================================== -Value Description -======================================= =========================================================================== -``max`` maximum -``min`` minimum -``sum`` sum -``mean`` mean -``nb_events`` number of relevant events fulfilling given criteria -``max_nb_consecutive_events`` maximum number of consecutive events fulfilling given criteria -``run_mean`` max or min of running mean -``run_sum`` max or min of running sum -``anomaly`` mean(future period) - mean(past period) -======================================= =========================================================================== - - -- The key ``date_event`` allows to keep date(s) of the event, it if is ``True``: - - - For simple statistics (min, max) in output netCDF file will be created "date_event" variable with numerical dates of the first occurrence of the event for each pixel. - - - For other operations in output netCDF file will be created "date_event_start" and "date_event_end" variables with numerical dates of the event for each pixel. - - .. note:: The "date_event", "date_event_start" and "date_event_end" netCDF variables have the same shape as index's one. - - .. warning:: "Date_event"/"date_event_start"/"date_event_end" has no value: - - - for certain pixels, if event is not found, - - for all pixels of "in-base" years (years in base period) for temperature percentile-based indices - it is not possible to determine the correct date of the event because of averaging of index in "in-base" year. - - -- The key ``var_type`` is used to chose the method for computing percentile thresholds. The methods are different for temperature and precipitation variables (more detailed :ref:`here `): - - - If 't' (temperature variable), percentile thresholds are computed for each calendar day, using *the bootstrapping procedure*. - - - If 'p' (precipitation variable), percentile threshold are calculated for whole set of values corresponding to wet days (i.e. days with daily precipitation amount >= 1.0 mm) in base period. - +================== + ++---------------------------------------+-----------------------------------------------------------------------------------+ +| Value | Description | ++=======================================+===================================================================================+ +| ``max`` | maximum | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``min`` | minimum | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``sum`` | sum | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``mean`` | mean | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``nb_events`` | number of relevant events fulfilling given criteria | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``max_nb_consecutive_events`` | maximum number of consecutive events fulfilling given criteria | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``run_mean`` | max or min of running mean | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``run_sum`` | max or min of running sum | ++---------------------------------------+-----------------------------------------------------------------------------------+ +| ``anomaly`` | mean(future period) - mean(past period) | ++---------------------------------------+-----------------------------------------------------------------------------------+ + +- The key ``date_event`` allows to keep date(s) of the event, it if is + ``True``: + + - For simple statistics (min, max) in output netCDF file will be + created "date_event" variable with numerical dates of the first + occurrence of the event for each pixel. + + - For other operations in output netCDF file will be created + "date_event_start" and "date_event_end" variables with + numerical dates of the event for each pixel. + + .. note:: + + The "date_event", "date_event_start" and "date_event_end" + netCDF variables have the same shape as index's one. + + .. warning:: + + "Date_event"/"date_event_start"/"date_event_end" has no value: + + - for certain pixels, if event is not found, + + - for all pixels of "in-base" years (years in base period) for + temperature percentile-based indices - it is not possible to + determine the correct date of the event because of averaging + of index in "in-base" year. + +- The key ``var_type`` is used to chose the method for computing + percentile thresholds. The methods are different for temperature and + precipitation variables (more detailed :ref:`here + `): + + - If 't' (temperature variable), percentile thresholds are + computed for each calendar day, using *the bootstrapping + procedure*. + + - If 'p' (precipitation variable), percentile threshold are + calculated for whole set of values corresponding to wet days + (i.e. days with daily precipitation amount >= 1.0 mm) in base + period. cal_operation parameterization -++++++++++++++++++++++++++++++ +============================== -Correspondence table between ``cal_operation`` and required/optional parameters: +Correspondence table between ``cal_operation`` and required/optional +parameters: +-------------------------------+-------------------------------+-----------------------+ -|"calc_operation" value | required parameters | optional_parameters | +| "calc_operation" value | required parameters | optional_parameters | +===============================+===============================+=======================+ -|'max'/'min' | |'coef', | -| | |'logical_operation', | -| | |'thresh', | -| | |'date_event' | +| 'max'/'min' | | 'coef', | +| | | 'logical_operation', | +| | | 'thresh', | +| | | 'date_event' | +-------------------------------+-------------------------------+-----------------------+ -|'mean'/'sum' | |'coef', | -| | |'logical_operation', | -| | |'thresh', | +| 'mean'/'sum' | | 'coef', | +| | | 'logical_operation', | +| | | 'thresh', | +-------------------------------+-------------------------------+-----------------------+ -|'nb_events' |'logical_operation', |'coef', | -| |'thresh', |'date_event' | +| 'nb_events' | 'logical_operation', | 'coef', 'date_event' | +| | 'thresh', | | | | | | -| |'link_logical_operations' | | -| |(if multivariable index), | | +| | 'link_logical_operations' (if | | +| | multivariable index), | | | | | | -| |'var_type' | | -| |(if percentile-based indices) | | +| | 'var_type' (if | | +| | percentile-based indices) | | +-------------------------------+-------------------------------+-----------------------+ -|'max_nb_consecutive_events' |'logical_operation', |'coef', | -| |'thresh' |'date_event' | -| | | | +| 'max_nb_consecutive_events' | 'logical_operation', 'thresh' | 'coef', 'date_event' | +-------------------------------+-------------------------------+-----------------------+ -|'run_mean'/'run_sum' |'extreme_mode', |'coef', | -| |'window_width' |'date_event' | +| 'run_mean'/'run_sum' | 'extreme_mode', | 'coef', 'date_event' | +| | 'window_width' | | +-------------------------------+-------------------------------+-----------------------+ -.. warning:: The 'window_width' here is a parameter for calculation of statistics in running window. Do not confuse with 'window_width' of :func:`icclim.index`, which is used for computing of temperature percentiles and set to 5 as default. +.. warning:: + + The 'window_width' here is a parameter for calculation of statistics + in running window. Do not confuse with 'window_width' of + :func:`icclim.index`, which is used for computing of temperature + percentiles and set to 5 as default. + +.. note:: -.. note:: See examples for computing custom indices :ref:`here `. + See examples for computing custom indices :ref:`here + `. diff --git a/doc/source/references/ecad_functions_api.rst b/doc/source/references/ecad_functions_api.rst index 9841dafa..03b92634 100644 --- a/doc/source/references/ecad_functions_api.rst +++ b/doc/source/references/ecad_functions_api.rst @@ -1,95 +1,99 @@ .. _ecad_functions_api: -ECA&D indices -============= +############### + ECA&D indices +############### -icclim 5.2 comes with convenience functions to call each individual index from `icclim` namespace. -These functions are autogenerated and are stored in `_generated_api.py`. -Additionally, user custom indices can be called with the `icclim.custom_index`. +icclim 5.2 comes with convenience functions to call each individual +index from `icclim` namespace. These functions are autogenerated and are +stored in `_generated_api.py`. Additionally, user custom indices can be +called with the `icclim.custom_index`. For example to use this new API with the index `su` you can do: -.. code-block:: python +.. code:: python - import glob - import icclim + import glob + import icclim - summer_days = icclim.su(in_files=glob.glob("netcdf_files/tasmax*.nc")) + summer_days = icclim.su(in_files=glob.glob("netcdf_files/tasmax*.nc")) -ECAD Generated Functions ------------------------- +************************** + ECAD Generated Functions +************************** .. automodule:: icclim._generated_api - :members: - :no-index: + :members: + :no-index: - .. rubric:: Functions + .. rubric:: Functions - .. autosummary:: - .. Documentation below is auto-generated with the extract-icclim-funs.py script - .. Generated API comment:Begin + .. autosummary:: - tg - tn - tx - dtr - etr - vdtr - su - tr - wsdi - tg90p - tn90p - tx90p - txx - tnx - csu - gd4 - fd - cfd - hd17 - id - tg10p - tn10p - tx10p - txn - tnn - csdi - cdd - prcptot - rr1 - sdii - cwd - rr - r10mm - r20mm - rx1day - rx5day - r75p - r75ptot - r95p - r95ptot - r99p - r99ptot - sd - sd1 - sd5cm - sd50cm - cd - cw - wd - ww - fxx - fg6bft - fgcalm - fg - ddnorth - ddeast - ddsouth - ddwest - gsl - spi6 - spi3 - custom_index + .. Documentation below is auto-generated with the extract-icclim-funs.py script + .. Generated API comment:Begin - .. Generated API comment:End + tg + tn + tx + dtr + etr + vdtr + su + tr + wsdi + tg90p + tn90p + tx90p + txx + tnx + csu + gd4 + fd + cfd + hd17 + id + tg10p + tn10p + tx10p + txn + tnn + csdi + cdd + prcptot + rr1 + sdii + cwd + rr + r10mm + r20mm + rx1day + rx5day + r75p + r75ptot + r95p + r95ptot + r99p + r99ptot + sd + sd1 + sd5cm + sd50cm + cd + cw + wd + ww + fxx + fg6bft + fgcalm + fg + ddnorth + ddeast + ddsouth + ddwest + gsl + spi6 + spi3 + custom_index + + .. Generated API comment:End diff --git a/doc/source/references/frequency.rst b/doc/source/references/frequency.rst index 85f45119..dc268c90 100644 --- a/doc/source/references/frequency.rst +++ b/doc/source/references/frequency.rst @@ -1,5 +1,6 @@ -Frequency -========= +########### + Frequency +########### .. automodule:: icclim.models.frequency - :members: Frequency + :members: Frequency diff --git a/doc/source/references/generic_functions_api.rst b/doc/source/references/generic_functions_api.rst index 50d1b9dd..b496c84e 100644 --- a/doc/source/references/generic_functions_api.rst +++ b/doc/source/references/generic_functions_api.rst @@ -1,59 +1,65 @@ .. _generic_functions_api: -Generic indices/indicators -========================== +############################ + Generic indices/indicators +############################ -icclim 6.0 introduced the concept of generic indices. -This document present the auto-generated functions that were built base on :py:class:`icclim.generic_indices.GenericIndicatorRegistry`. -The are accessible directly from `icclim` namespace. +icclim 6.0 introduced the concept of generic indices. This document +present the auto-generated functions that were built base on +:py:class:`icclim.generic_indices.GenericIndicatorRegistry`. The are +accessible directly from `icclim` namespace. -As an example, you can compute the number of days where a threshold is reached with: +As an example, you can compute the number of days where a threshold is +reached with: -.. code-block:: python +.. code:: python - import glob - import icclim + import glob + import icclim - hot_days_ds = icclim.count_occurrences( - in_files="netcdf_files/data*.nc", - var_name=["tmax"], - threshold="> 27 degree_Celsius", - ) + hot_days_ds = icclim.count_occurrences( + in_files="netcdf_files/data*.nc", + var_name=["tmax"], + threshold="> 27 degree_Celsius", + ) -For more details on threshold and how to personalize them, see :ref:`threshold` documentation. -We also prepared a few examples on :ref:`generic_indices_recipes` so that you get an idea of the capabilities of -these generic indices. +For more details on threshold and how to personalize them, see +:ref:`threshold` documentation. We also prepared a few examples on +:ref:`generic_indices_recipes` so that you get an idea of the +capabilities of these generic indices. -Generated API -------------- +*************** + Generated API +*************** .. automodule:: icclim._generated_api - :members: - - .. rubric:: Functions - - .. autosummary:: - .. Documentation below is auto-generated with the extract-icclim-funs.py script - .. Generated API comment:Begin - - count_occurrences - max_consecutive_occurrence - sum_of_spell_lengths - excess - deficit - fraction_of_total - maximum - minimum - average - sum - standard_deviation - max_of_rolling_sum - min_of_rolling_sum - max_of_rolling_average - min_of_rolling_average - mean_of_difference - difference_of_extremes - mean_of_absolute_one_time_step_difference - difference_of_means - - .. Generated API comment:End + :members: + + .. rubric:: Functions + + .. autosummary:: + + .. Documentation below is auto-generated with the extract-icclim-funs.py script + .. Generated API comment:Begin + + count_occurrences + max_consecutive_occurrence + sum_of_spell_lengths + excess + deficit + fraction_of_total + maximum + minimum + average + sum + standard_deviation + max_of_rolling_sum + min_of_rolling_sum + max_of_rolling_average + min_of_rolling_average + mean_of_difference + difference_of_extremes + mean_of_absolute_one_time_step_difference + difference_of_means + + .. Generated API comment:End diff --git a/doc/source/references/icclim_index_api.rst b/doc/source/references/icclim_index_api.rst index bfa836f9..56ef3203 100644 --- a/doc/source/references/icclim_index_api.rst +++ b/doc/source/references/icclim_index_api.rst @@ -1,390 +1,423 @@ -icclim.index() -============== +################ + icclim.index() +################ -icclim exposes a main entry point with :func:`icclim.index`. -It is used to compute both ECA&D indices and user defined indices. -There are quite a lot of options, but only a few of them are needed to compute simple indices. -Our :ref:`how_to` recipes are also a good start to have an idea on how to use `icclim.index`. +icclim exposes a main entry point with :func:`icclim.index`. It is used +to compute both ECA&D indices and user defined indices. There are quite +a lot of options, but only a few of them are needed to compute simple +indices. Our :ref:`how_to` recipes are also a good start to have an idea +on how to use `icclim.index`. .. note:: - With version 5.2.0, icclim API now includes each individual index as a standalone function. - Check :ref:`ecad_functions_api` to see how to call them. + With version 5.2.0, icclim API now includes each individual index as + a standalone function. Check :ref:`ecad_functions_api` to see how to + call them. -Compute climat indices ----------------------- +************************ + Compute climat indices +************************ .. autofunction:: icclim.index(**kwargs) -.. note:: For the variable names see the :ref:`correspondence table "index - source variable" ` +.. note:: + For the variable names see the :ref:`correspondence table "index - + source variable" ` Below are some additional information about input parameters. ``in_files`` and ``var_name`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +============================= The ``in_files`` parameter can be - - A *string* path to a netCDF file or a zarr store - - A *list of strings* to represent multiple netCDF files to combine - - A *xarray.Dataset* - - A *xarray.DataArray* - - A python dictionary (new in 5.3) + - A *string* path to a netCDF file or a zarr store + - A *list of strings* to represent multiple netCDF files to combine + - A *xarray.Dataset* + - A *xarray.DataArray* + - A python dictionary (new in 5.3) +---------------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ -| | single input file per variable | several input files per variable | +| | single input file per variable | several input files per variable | +=================================+==========================================================+=========================================================================================================+ -| simple index | ``var_name`` = 'tasmax' | ``var_name`` = 'tasmax' | -| (based on a single variable) +----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ -| | ``in_files`` = 'tasmax_1990-2010.nc' | ``in_files`` = ['tasmax_1990-2000.nc', 'tasmax_2000-2010.nc'] | +| simple index (based on a single | ``var_name`` = 'tasmax' | ``var_name`` = 'tasmax' | +| variable) | | | ++---------------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ +| ``in_files`` = | ``in_files`` = ['tasmax_1990-2000.nc', | | +| 'tasmax_1990-2010.nc' | 'tasmax_2000-2010.nc'] | | +---------------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ -| multivariable index | ``var_name`` = ['tas', 'pr'] | ``var_name`` = ['tas', 'pr'] | -| (based on several variables) +----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ -| | ``in_files`` = ['tas_1990-2010.nc', 'pr_1990-2010.nc'] | ``in_files`` = ['tas_1990-2000.nc', 'tas_2000-2010.nc', 'pr_1990-2000.nc', 'pr_2000-2010.nc'] | +| multivariable index (based on | ``var_name`` = ['tas', 'pr'] | ``var_name`` = ['tas', 'pr'] | +| several variables) | | | ++---------------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ +| ``in_files`` = | ``in_files`` = ['tas_1990-2000.nc', 'tas_2000-2010.nc', | | +| ['tas_1990-2010.nc', | 'pr_1990-2000.nc', 'pr_2000-2010.nc'] | | +| 'pr_1990-2010.nc'] | | | +---------------------------------+----------------------------------------------------------+---------------------------------------------------------------------------------------------------------+ New in 5.3 -++++++++++ +---------- + +Starting with icclim 5.3, ``in_files`` can describe variable names, +formerly set in ``var_name``, as dictionary format. The dictionary keys +are variable names and values are the usual in_files types (netCDF, +zarr, Dataset, DataArray). + +.. code:: python -Starting with icclim 5.3, ``in_files`` can describe variable names, formerly set in ``var_name``, as dictionary format. -The dictionary keys are variable names and values are the usual in_files types (netCDF, zarr, Dataset, DataArray). + in_files = {"tasmax": "tasmax.nc", "pr": "precip.zarr"} ->>> in_files = {"tasmax": "tasmax.nc", "pr": "precip.zarr"} +Moreover, this new dictionary syntax can be used to specify a different +set of files for percentiles. -Moreover, this new dictionary syntax can be used to specify a different set of files for percentiles. +.. code:: python ->>> in_files = {"tasmax": "tasmax.nc", "thresholds": "tasmax-90p.zarr"} + in_files = {"tasmax": "tasmax.nc", "thresholds": "tasmax-90p.zarr"} -The ``thresholds`` input should contain percentile thresholds that will be used will be used in place of computing them. -It allow to reuse percentiles computed and stored elsewhere easily. -For the record, you can generate percentiles with ``save_percentile`` parameter of icclim.index. +The ``thresholds`` input should contain percentile thresholds that will +be used will be used in place of computing them. It allow to reuse +percentiles computed and stored elsewhere easily. For the record, you +can generate percentiles with ``save_percentile`` parameter of +icclim.index. .. note:: - Be aware that percentiles will **not** be bootstrapped. Thus, the result could be biased if the period on which percentiles - were computed partially overlap the index studied period. - See ``_ for more information on this topic. + Be aware that percentiles will **not** be bootstrapped. Thus, the + result could be biased if the period on which percentiles were + computed partially overlap the index studied period. See + Computingpercentilethresholds for more information on this topic. .. _slice_mode: ``slice_mode`` -~~~~~~~~~~~~~~ -The ``slice_mode`` parameter defines a desired temporal aggregation. Thus, each index can be calculated at annual, winter half-year, summer half-year, winter, spring, -summer, autumn and monthly frequency: +============== + +The ``slice_mode`` parameter defines a desired temporal aggregation. +Thus, each index can be calculated at annual, winter half-year, summer +half-year, winter, spring, summer, autumn and monthly frequency: +----------------------------------+-------------------------------------------+ -| Value (string) | Description | +| Value (string) | Description | +==================================+===========================================+ -| ``year`` (default) | annual | +| ``year`` (default) | annual | +----------------------------------+-------------------------------------------+ -| ``month`` | monthly (all months) | +| ``month`` | monthly (all months) | +----------------------------------+-------------------------------------------+ -| ``ONDJFM`` | winter half-year | +| ``ONDJFM`` | winter half-year | +----------------------------------+-------------------------------------------+ -| ``AMJJAS`` | summer half-year | +| ``AMJJAS`` | summer half-year | +----------------------------------+-------------------------------------------+ -| ``DJF`` | winter | +| ``DJF`` | winter | +----------------------------------+-------------------------------------------+ -| ``MAM`` | spring | +| ``MAM`` | spring | +----------------------------------+-------------------------------------------+ -| ``JJA`` | summer | +| ``JJA`` | summer | +----------------------------------+-------------------------------------------+ -| ``SON`` | autumn | +| ``SON`` | autumn | +----------------------------------+-------------------------------------------+ -| ``['month', [4,5,11]]`` | monthly sampling filtered | +| ``['month', [4,5,11]]`` | monthly sampling filtered | +----------------------------------+-------------------------------------------+ -| ``['season', [4,5,6]]`` | seasonal (1 value per season) | +| ``['season', [4,5,6]]`` | seasonal (1 value per season) | +----------------------------------+-------------------------------------------+ -| | seasonal (1 value per season) | -| ``['clipped_season', [4,5,6]]`` | spells starting before season | -| | start are not accounted | +| ``['clipped_season', [4,5,6]]`` | seasonal (1 value per season) spells | +| | starting before season start are not | +| | accounted | +----------------------------------+-------------------------------------------+ -| ``3W`` | A valid pandas frequency (3 weeks here) | +| ``3W`` | A valid pandas frequency (3 weeks here) | +----------------------------------+-------------------------------------------+ -| The winter season (``DJF``) of 2000 is composed of December 2000, January 2001 and February 2001. -| Likewise, the winter half-year (``ONDJFM``) of 2000 includes October 2000, November 2000, December 2000, January 2001, February 2001 and March 2001. +| The winter season (``DJF``) of 2000 is composed of December 2000, + January 2001 and February 2001. +| Likewise, the winter half-year (``ONDJFM``) of 2000 includes October + 2000, November 2000, December 2000, January 2001, February 2001 and + March 2001. Monthly time series filter -++++++++++++++++++++++++++ -Monthly time series with months selected by user (the keyword can be either `month` or `months`): +-------------------------- + +Monthly time series with months selected by user (the keyword can be +either `month` or `months`): ->>> slice_mode = [ -... "month", -... [4, 5, 11], -... ] # index will be computed only for April, May and November +.. code:: python ->>> slice_mode = ["month", [4]] # index will be computed only for April + # index will be computed only for April, May and November + slice_mode = ["month", [4, 5, 11]] + # index will be computed only for April + slice_mode = ["month", [4]] User defined seasons -++++++++++++++++++++ -You can either defined seasons aware of data outside their bounds (keyword `season`) or -seasons which clip all data outside their bounds (keyword `clipped_season`). -The later is most useful on indices computing spells, if you want to totally ignore spells that could -have started before your custom season. +-------------------- ->>> slice_mode = ["season", [4, 5, 6, 7]] # March to July un-clipped ->>> slice_mode = ["clipped_season", [4, 5, 6, 7]] # March to July clipped +| You can either defined seasons aware of data outside their bounds + (keyword `season`) or seasons which clip all data outside their + bounds (keyword `clipped_season`). +| The later is most useful on indices computing spells, if you want to + totally ignore spells that could have started before your custom + season. ->>> slice_mode = ["season", [11, 12, 1]] # November to January un-clipped ->>> slice_mode = ["clipped_season", ([11, 12, 1])] # November to January clipped +.. code:: python + + slice_mode = ["season", [4, 5, 6, 7]] # March to July un-clipped + slice_mode = ["clipped_season", [4, 5, 6, 7]] # March to July clipped + + slice_mode = ["season", [11, 12, 1]] # November to January un-clipped + slice_mode = ["clipped_season", ([11, 12, 1])] # November to January clipped Additionally, you can define a season between two exact dates: ->>> slice_mode = ["season", ["07-19", "08-14"]] +.. code:: python + + slice_mode = ["season", ["07-19", "08-14"]] ->>> slice_mode = ["clipped_season", ["07-19", "08-14"]] + slice_mode = ["clipped_season", ["07-19", "08-14"]] .. note:: - With 5.3.0 icclim now accepts pandas string frequency for slice_mode to resample the output data to a given frequency - There are multiple combinations possible such as: "2AS-FEB" to resample data on two (2) years (A) starting (S) in February (FEB). - For further information, refer to pandas `offset aliases `_. + With 5.3.0 icclim now accepts pandas string frequency for slice_mode + to resample the output data to a given frequency There are multiple + combinations possible such as: "2AS-FEB" to resample data on two (2) + years (A) starting (S) in February (FEB). For further information, + refer to pandas `offset aliases + `_. ``threshold`` -~~~~~~~~~~~~~ -It is possible to set a user define threshold for the following indices: +============= -* SU (default threshold: 25°C) -* CSU (default threshold: 25°C) -* TR (default threshold: 20°C) -* CSDI (default 10th percentile) -* WSDI (default 90th percentile) -* TX90p (default 90th percentile) -* TG90p (default 90th percentile) -* TN90p (default 90th percentile) -* TX10p (default 10th percentile) -* TG10p (default 10th percentile) -* TN10p (default 10th percentile) +It is possible to set a user define threshold for the following indices: +- SU (default threshold: 25°C) +- CSU (default threshold: 25°C) +- TR (default threshold: 20°C) +- CSDI (default 10th percentile) +- WSDI (default 90th percentile) +- TX90p (default 90th percentile) +- TG90p (default 90th percentile) +- TN90p (default 90th percentile) +- TX10p (default 10th percentile) +- TG10p (default 10th percentile) +- TN10p (default 10th percentile) The threshold could be one value: ->>> threshold = 30 +.. code:: python + + threshold = 30 or a list of values: ->>> threshold = [20, 25, 30] +.. code:: python + + threshold = [20, 25, 30] + +.. note:: -.. note:: thresholds should be a float, the unit is expected to be in degrees Celsius or a unit-less for percentiles. + thresholds should be a float, the unit is expected to be in degrees + Celsius or a unit-less for percentiles. ``transfer_limit_Mbytes`` -~~~~~~~~~~~~~~~~~~~~~~~~~ +========================= + !Deprecated -``transfer_limit_Mbytes`` is now ignored and will be deleted in a futur version. -See :ref:`how to chunk data and parallelize computation ` to configure dask chunking. +``transfer_limit_Mbytes`` is now ignored and will be deleted in a futur +version. See :ref:`how to chunk data and parallelize computation ` +to configure dask chunking. ``callback`` -~~~~~~~~~~~~ +============ + !Deprecated Callback can used to output a estimated progress of the calculus. -However, when using dask, the calculus are done lazily at the very end of icclim's process. -Thus the values transmitted to ``callback`` are irrelevant with dask. - +However, when using dask, the calculus are done lazily at the very end +of icclim's process. Thus the values transmitted to ``callback`` are +irrelevant with dask. -.. _ignore_Feb29th_label: +.. _ignore_feb29th_label: ``ignore_Feb29th`` -~~~~~~~~~~~~~~~~~~ +================== + If it is ``True``, we kick out February 29th. .. _pctl_methods_label: -Computing percentile thresholds -------------------------------- - -Percentile thresholds are used as thresholds for calculation of percentile-based indices -and are computed from values inside a reference period, named *base period* which is usually 30 years (``base_period_time_range`` parameter). +********************************* + Computing percentile thresholds +********************************* +Percentile thresholds are used as thresholds for calculation of +percentile-based indices and are computed from values inside a reference +period, named *base period* which is usually 30 years +(``base_period_time_range`` parameter). There are two methods for calculation of percentile thresholds: -**1. For temperature indices**, theses thresholds are computed *for each calendar day* from samples (5-day window centred on the -calendar day in the base period) which depend on :ref:`window_width `, :ref:`only_leap_years ` -and :ref:`ignore_Feb29th ` parameters. +**1. For temperature indices**, theses thresholds are computed *for each +calendar day* from samples (5-day window centred on the calendar day in +the base period) which depend on :ref:`window_width +`, :ref:`only_leap_years ` +and :ref:`ignore_Feb29th ` parameters. +In *icclim* these thresholds represent a dictionary with 365 (if +:ref:`ignore_Feb29th ` is True) or 366 (if +:ref:`ignore_Feb29th ` is False) calendar days as +keys, and 2D arrays as values. +.. note:: -In *icclim* these thresholds represent a dictionary with 365 (if :ref:`ignore_Feb29th ` is True) -or 366 (if :ref:`ignore_Feb29th ` is False) calendar days as keys, and 2D arrays as values. + A calendar day key of the dictionary is composed from the + corresponding month and day, separated by a comma. For example, + getting of the 2D array with percentiles for April 13th, looks like + *my_perc_dict[4,13]*. + +The percentile thresholds are different for "in-base" years (years +inside the base period) and "out-of-base" years. For "in-base" years, +*icclim* uses the *bootstrapping procedure*, which is explained in this +article: `Avoiding Inhomogeneity in Percentile-Based Indices of +Temperature Extremes (Zhang et al.) +`_ - see +the resampling algorithm in the section **4. Removing the "jump"**. -.. note:: A calendar day key of the dictionary is composed from the corresponding month and day, separated by a comma. For example, getting of the 2D array with percentiles for April 13th, looks like *my_perc_dict[4,13]*. +.. warning:: + Computing of percentile thresholds with the bootstrapping procedure + may take some time! For example, a 30-yr base period requires (30-1) + times of computing percentiles for each "in-base" year!, i.e. + 30*(30-1) times in total (+1 time without bootstrapping for + "out-of-base" years). -The percentile thresholds are different for "in-base" years (years inside the base period) and "out-of-base" years. -For "in-base" years, *icclim* uses the *bootstrapping procedure*, which is -explained in this article: `Avoiding Inhomogeneity in Percentile-Based Indices of -Temperature Extremes (Zhang et al.) `_ - see -the resampling algorithm in the section **4. Removing the "jump"**. +**2. For precipitation indices**, the thresholds are computed from the +set of wet days (i.e. days when daily precipitation amount >= 1.0 mm) in +the base period. In *icclim* these thresholds represent an 2D array. + +Both methods could use 2 types of :ref:`interpolation +`. -.. warning:: Computing of percentile thresholds with the bootstrapping procedure may take some time! For example, a 30-yr base period requires (30-1) times of computing percentiles for each "in-base" year!, i.e. 30*(30-1) times in total (+1 time without bootstrapping for "out-of-base" years). +The `calc_percentiles.py +`_ +module contains *get_percentile_dict* and *get_percentile_arr* functions +for the described methods. +.. _window_width_label: + +``window_width`` +================ +The ``window width`` is used for getting samples for percentiles +computing and is set to 5: percentiles-based indices use 5-day window. +The window is centred on a certain calendar day, for example: - **April +13th**, we take the values for *April 11th*, *April 12th*, *April 13th*, +*April 14th* and *April 15th* of each year of the base period. - +**January 1st**, we take all days of *December 30th*, *December 31st*, +*January 1st*, *January 2nd* and *January 3rd*. -**2. For precipitation indices**, the thresholds are computed from the set of wet days (i.e. days when daily precipitation amount >= 1.0 mm) in the base period. In *icclim* these thresholds represent an 2D array. +Hence, for a base period of 30 years and 5-day window width for each +calendar day (except February 29th), there are 150 values ( 30 * 5 ) to +compute its percentile value. +.. _only_leap_years_label: +``only_leap_years`` +=================== -Both methods could use 2 types of :ref:`interpolation `. +The ``only_leap_years`` parameter selects which of two methods to use +for calculating a percentile value for the calendar day of **February +29th**: + - if ``True``, we take only leap years, i.e. for example for the + base period of 1980-1990 and 5-day window width, we take the + values corresponding to the following dates: -The `calc_percentiles.py `_ module contains *get_percentile_dict* and *get_percentile_arr* functions for the described methods. + 1980-02-27, 1980-02-28, **1980-02-29**, 1980-03-01, 1980-03-02, -.. _window_width_label: + 1984-02-27, 1984-02-28, **1984-02-29**, 1984-03-01, 1984-03-02, -``window_width`` -~~~~~~~~~~~~~~~~~ + 1988-02-27, 1988-02-28, **1988-02-29**, 1988-03-01, 1988-03-02 -The ``window width`` is used for getting samples for percentiles computing and is set to 5: percentiles-based indices use 5-day window. -The window is centred on a certain calendar day, for example: -- **April 13th**, we take the values for *April 11th*, *April 12th*, *April 13th*, *April 14th* and *April 15th* of each year of the base period. -- **January 1st**, we take all days of *December 30th*, *December 31st*, *January 1st*, *January 2nd* and *January 3rd*. + - if ``False``, for the same base period and window width, we have: -Hence, for a base period of 30 years and 5-day window width for each calendar day (except February 29th), there are 150 values ( 30 * 5 ) -to compute its percentile value. + 1980-02-27, 1980-02-28, **1980-02-29**, 1980-03-01, 1980-03-02, + 1981-02-27, 1981-02-28, 1981-03-01, 1981-03-02, -.. _only_leap_years_label: + 1982-02-27, 1982-02-28, 1982-03-01, 1982-03-02, + 1983-02-27, 1983-02-28, 1983-03-01, 1983-03-02, -``only_leap_years`` -~~~~~~~~~~~~~~~~~~~ - -The ``only_leap_years`` parameter selects which of two methods to use for calculating a percentile value -for the calendar day of **February 29th**: - - - if ``True``, we take only leap years, i.e. for example for the base period of 1980-1990 and 5-day window width, we take the values corresponding to the following dates: - - 1980-02-27, - 1980-02-28, - **1980-02-29**, - 1980-03-01, - 1980-03-02, - - 1984-02-27, - 1984-02-28, - **1984-02-29**, - 1984-03-01, - 1984-03-02, - - 1988-02-27, - 1988-02-28, - **1988-02-29**, - 1988-03-01, - 1988-03-02 - - - - if ``False``, for the same base period and window width, we have: - - 1980-02-27, - 1980-02-28, - **1980-02-29**, - 1980-03-01, - 1980-03-02, - - 1981-02-27, - 1981-02-28, - 1981-03-01, - 1981-03-02, - - 1982-02-27, - 1982-02-28, - 1982-03-01, - 1982-03-02, - - 1983-02-27, - 1983-02-28, - 1983-03-01, - 1983-03-02, - - 1984-02-27, - 1984-02-28, - **1984-02-29**, - 1984-03-01, - 1984-03-02, - - 1985-02-27, - 1985-02-28, - 1985-03-01, - 1985-03-02, - - 1986-02-27, - 1986-02-28, - 1986-03-01, - 1986-03-02, - - 1987-02-27, - 1987-02-28, - 1987-03-01, - 1987-03-02, - - 1988-02-27, - 1988-02-28, - **1988-02-29**, - 1988-03-01, - 1988-03-02 - - 1989-02-27, - 1989-02-28, - 1989-03-01, - 1989-03-02, - - 1990-02-27, - 1990-02-28, - 1990-03-01, - 1990-03-02 - - The second way is preferable, because we have more samples. - -.. warning:: If :ref:`ignore_Feb29th ` is True, the ``only_leap_years`` does not make sense! + 1984-02-27, 1984-02-28, **1984-02-29**, 1984-03-01, 1984-03-02, + + 1985-02-27, 1985-02-28, 1985-03-01, 1985-03-02, + + 1986-02-27, 1986-02-28, 1986-03-01, 1986-03-02, + + 1987-02-27, 1987-02-28, 1987-03-01, 1987-03-02, + + 1988-02-27, 1988-02-28, **1988-02-29**, 1988-03-01, 1988-03-02 + + 1989-02-27, 1989-02-28, 1989-03-01, 1989-03-02, + + 1990-02-27, 1990-02-28, 1990-03-01, 1990-03-02 + + The second way is preferable, because we have more samples. + +.. warning:: + + If :ref:`ignore_Feb29th ` is True, the + ``only_leap_years`` does not make sense! .. _interpolation_label: ``interpolation`` -~~~~~~~~~~~~~~~~~ -Computing of a percentile value could use ``linear``, also known as type 7 in other software or the interpolation proposed -by `Hyndman and Fan (1996) `_, named -in *icclim* as ``hyndman_fan`` interpolation, also known as type 8. +================= +Computing of a percentile value could use ``linear``, also known as type +7 in other software or the interpolation proposed by `Hyndman and Fan +(1996) +`_, +named in *icclim* as ``hyndman_fan`` interpolation, also known as type +8. ``out_unit`` -~~~~~~~~~~~~~~~ -Percentile-based indices (TX10p, TX90p, TN10p, TN90p, TG10p, TG90p, R75p, R95p and R99p) could be returned as number of days (default) -or as percentage of days (``out_unit`` = "%"). +============ -.. _custom_indices_old: +Percentile-based indices (TX10p, TX90p, TN10p, TN90p, TG10p, TG90p, +R75p, R95p and R99p) could be returned as number of days (default) or as +percentage of days (``out_unit`` = "%"). -Custom indices --------------- +.. _custom_indices_old: -Custom indices are now described in their own chapter: :ref:`custom_indices` +**************** + Custom indices +**************** +Custom indices are now described in their own chapter: +:ref:`custom_indices` -.. _table_index_sourceVar_label: +.. _table_index_sourcevar_label: -Correspondence table "index - source variable" ----------------------------------------------- +************************************************ + Correspondence table "index - source variable" +************************************************ -Using common names for the source variable, icclim is able to lookup the proper variable in the given input to compute an index. +Using common names for the source variable, icclim is able to lookup the +proper variable in the given input to compute an index. +------------------------------------------------------------+---------------------------------------------+ -| index | Source variable | +| index | Source variable | +============================================================+=============================================+ -|TG, GD4, HD17, TG10p, TG90p | daily mean temperature | +| TG, GD4, HD17, TG10p, TG90p | daily mean temperature | +------------------------------------------------------------+---------------------------------------------+ -|TN, TNx, TNn, TR, FD, CFD, TN10p, TN90p, CSDI | daily minimum temperature | +| TN, TNx, TNn, TR, FD, CFD, TN10p, TN90p, CSDI | daily minimum temperature | +------------------------------------------------------------+---------------------------------------------+ -|TX, TXx, TXn, SU, CSU, ID, TX10p, TX90p, WSDI | daily maximum temperature | +| TX, TXx, TXn, SU, CSU, ID, TX10p, TX90p, WSDI | daily maximum temperature | +------------------------------------------------------------+---------------------------------------------+ -|DTR, ETR, vDTR | daily maximum + daily minimum temperature | +| DTR, ETR, vDTR | daily maximum + daily minimum temperature | +------------------------------------------------------------+---------------------------------------------+ -|PRCPTOT, RR1, SDII, CWD, CDD, R10mm, R20mm, RX1day, RX5day, | daily precipitation flux (liquide phase) | -|R75p, R75pTOT, R95p, R95pTOT, R99p, R99pTOT | | +| PRCPTOT, RR1, SDII, CWD, CDD, R10mm, R20mm, RX1day, | daily precipitation flux (liquide phase) | +| RX5day, R75p, R75pTOT, R95p, R95pTOT, R99p, R99pTOT | | +------------------------------------------------------------+---------------------------------------------+ -|SD, SD1, SD5cm, SD50cm | daily snowfall flux (solid phase) | +| SD, SD1, SD5cm, SD50cm | daily snowfall flux (solid phase) | +------------------------------------------------------------+---------------------------------------------+ -| | daily mean temperature + | -|CD, CW, WD, WW | daily precipitation flux (liquide phase) | +| CD, CW, WD, WW | daily mean temperature + daily | +| | precipitation flux (liquide phase) | +------------------------------------------------------------+---------------------------------------------+ diff --git a/doc/source/references/index.rst b/doc/source/references/index.rst index 3412c38c..3ca89417 100644 --- a/doc/source/references/index.rst +++ b/doc/source/references/index.rst @@ -1,20 +1,23 @@ -References -========== +############ + References +############ -This is the technical documentation of icclim. You should find here an overview of the public API as well as the description of internal mechanisms of icclim. +This is the technical documentation of icclim. You should find here an +overview of the public API as well as the description of internal +mechanisms of icclim. -| For beginner, it is recommended to start with :ref:`tutorials`. -| For more abstract discussion about icclim, see :ref:`explanation`. +| For beginner, it is recommended to start with :ref:`tutorials`. +| For more abstract discussion about icclim, see :ref:`explanation`. .. toctree:: - :maxdepth: 2 - :caption: References + :maxdepth: 2 + :caption: References - icclim_index_api - ecad_functions_api - generic_functions_api - custom_indices - frequency - threshold - output_metadata - release_notes + icclim_index_api + ecad_functions_api + generic_functions_api + custom_indices + frequency + threshold + output_metadata + release_notes diff --git a/doc/source/references/output_metadata.rst b/doc/source/references/output_metadata.rst index dd9c227d..e1ec7481 100644 --- a/doc/source/references/output_metadata.rst +++ b/doc/source/references/output_metadata.rst @@ -1,177 +1,177 @@ -Output metadata -================ +################# + Output metadata +################# Output metadata contains at least the following variables: -- lat -- lat_bnds -- lon_bnds -- lon -- time -- time_bnds -- index +- lat +- lat_bnds +- lon_bnds +- lon +- time +- time_bnds +- index -lat, lon, lat_bnds, lon_bnds ---------------------------------- -They are copied from source file with all their attributes. +****************************** + lat, lon, lat_bnds, lon_bnds +****************************** +They are copied from source file with all their attributes. -time and time_bnds --------------------- +******************** + time and time_bnds +******************** -+----------------------+-----------------------+------------------------------------+ -| Slice_mode |  *time* | *time_bnds* | ++----------------------+-----------------------+-----------------+------------------+ +| Slice_mode | *time* | *time_bnds* | | +======================+=======================+=================+==================+ -| ``year`` | YYYY-07-01 | YYYY-01-01 | (YYYY+1)-01-01 | +| ``year`` | YYYY-07-01 | YYYY-01-01 | (YYYY+1)-01-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``month`` | YYYY-MM-16 | YYYY-MM-01 | YYYY-(MM+1)-01 | +| ``month`` | YYYY-MM-16 | YYYY-MM-01 | YYYY-(MM+1)-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``ONDJFM`` | YYYY-01-01 | YYYY-10-01 | (YYYY+1)-04-01 | +| ``ONDJFM`` | YYYY-01-01 | YYYY-10-01 | (YYYY+1)-04-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``AMJJAS`` | YYYY-07-01 | YYYY-04-01 | YYYY-10-01 | +| ``AMJJAS`` | YYYY-07-01 | YYYY-04-01 | YYYY-10-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``DJF`` | YYYY-01-16 | YYYY-12-01 | (YYYY+1)-03-01 | +| ``DJF`` | YYYY-01-16 | YYYY-12-01 | (YYYY+1)-03-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``MAM`` | YYYY-04-16 | YYYY-03-01 | YYYY-06-01 | +| ``MAM`` | YYYY-04-16 | YYYY-03-01 | YYYY-06-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``JJA`` | YYYY-07-16 | YYYY-06-01 | YYYY-09-01 | +| ``JJA`` | YYYY-07-16 | YYYY-06-01 | YYYY-09-01 | +----------------------+-----------------------+-----------------+------------------+ -| ``SON`` | YYYY-10-16 | YYYY-09-01 | YYYY-12-01 | +| ``SON`` | YYYY-10-16 | YYYY-09-01 | YYYY-12-01 | +----------------------+-----------------------+-----------------+------------------+ -.. note:: The second bound in time_bnds is excluded! +.. note:: -Example: annual time steps -~~~~~~~~~~~~~~~~~~~~~~~~~~ + The second bound in time_bnds is excluded! -.. code-block:: rest +Example: annual time steps +========================== - $ ncdump -v time indice_FD_year_1950-1955.nc -t +.. code:: rest - time = "1950-07-01", "1951-07-01", "1952-07-01", "1953-07-01", - "1954-07-01", "1955-07-01" ; + $ ncdump -v time indice_FD_year_1950-1955.nc -t - $ ncdump -v time_bnds indice_FD_year_1950-1955.nc -t + time = "1950-07-01", "1951-07-01", "1952-07-01", "1953-07-01", + "1954-07-01", "1955-07-01" ; - time_bnds = - "1950-01-01 12", "1951-01-01 12", - "1951-01-01 12", "1952-01-01 12", - "1952-01-01 12", "1953-01-01 12", - "1953-01-01 12", "1954-01-01 12", - "1954-01-01 12", "1955-01-01 12", - "1955-01-01 12", "1956-01-01 12" ; + $ ncdump -v time_bnds indice_FD_year_1950-1955.nc -t + time_bnds = + "1950-01-01 12", "1951-01-01 12", + "1951-01-01 12", "1952-01-01 12", + "1952-01-01 12", "1953-01-01 12", + "1953-01-01 12", "1954-01-01 12", + "1954-01-01 12", "1955-01-01 12", + "1955-01-01 12", "1956-01-01 12" ; Example: monthly time steps -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +=========================== -.. code-block:: rest +.. code:: rest - $ ncdump -v time indice_FD_month_1950-1955.nc -t + $ ncdump -v time indice_FD_month_1950-1955.nc -t - time = "1950-01-16", "1950-02-16", "1950-03-16", "1950-04-16", - "1950-05-16", "1950-06-16", "1950-07-16", "1950-08-16", - "1950-05-16", "1950-06-16", "1950-07-16", "1950-08-16", - [...] + time = "1950-01-16", "1950-02-16", "1950-03-16", "1950-04-16", + "1950-05-16", "1950-06-16", "1950-07-16", "1950-08-16", + "1950-05-16", "1950-06-16", "1950-07-16", "1950-08-16", + [...] - $ ncdump -v time_bnds indice_FD_month_1950-1955.nc -t + $ ncdump -v time_bnds indice_FD_month_1950-1955.nc -t - time_bnds = - "1950-01-01 12", "1950-02-01 12", - "1950-02-01 12", "1950-03-01 12", - "1950-03-01 12", "1950-04-01 12", - "1950-04-01 12", "1950-05-01 12", - [...] + time_bnds = + "1950-01-01 12", "1950-02-01 12", + "1950-02-01 12", "1950-03-01 12", + "1950-03-01 12", "1950-04-01 12", + "1950-04-01 12", "1950-05-01 12", + [...] Example: seasonal time steps -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. code-block:: rest +============================ - $ ncdump -v time indice_FD_DJF_1950-1955.nc -t +.. code:: rest - time = "1951-01-16", "1952-01-16", "1953-01-16", "1954-01-16", - "1955-01-16" ; + $ ncdump -v time indice_FD_DJF_1950-1955.nc -t - $ ncdump -v time_bnds indice_FD_DJF_1950-1955.nc -t + time = "1951-01-16", "1952-01-16", "1953-01-16", "1954-01-16", + "1955-01-16" ; - time_bnds = - "1950-12-01 12", "1951-03-01 12", - "1951-12-01 12", "1952-03-01 12", - "1952-12-01 12", "1953-03-01 12", - "1953-12-01 12", "1954-03-01 12", - "1954-12-01 12", "1955-03-01 12" ; + $ ncdump -v time_bnds indice_FD_DJF_1950-1955.nc -t + time_bnds = + "1950-12-01 12", "1951-03-01 12", + "1951-12-01 12", "1952-03-01 12", + "1952-12-01 12", "1953-03-01 12", + "1953-12-01 12", "1954-03-01 12", + "1954-12-01 12", "1955-03-01 12" ; +.. code:: rest -.. code-block:: rest + $ ncdump -v time indice_FD_SON_1950-1955.nc -t - $ ncdump -v time indice_FD_SON_1950-1955.nc -t + time = "1950-10-16", "1951-10-16", "1952-10-16", "1953-10-16", + "1954-10-16", "1955-10-16" ; - time = "1950-10-16", "1951-10-16", "1952-10-16", "1953-10-16", - "1954-10-16", "1955-10-16" ; + $ ncdump -v time_bnds indice_FD_SON_1950-1955.nc -t - $ ncdump -v time_bnds indice_FD_SON_1950-1955.nc -t + time_bnds = + "1950-09-01 12", "1950-12-01 12", + "1951-09-01 12", "1951-12-01 12", + "1952-09-01 12", "1952-12-01 12", + "1953-09-01 12", "1953-12-01 12", + "1954-09-01 12", "1954-12-01 12", + "1955-09-01 12", "1955-12-01 12" ; - time_bnds = - "1950-09-01 12", "1950-12-01 12", - "1951-09-01 12", "1951-12-01 12", - "1952-09-01 12", "1952-12-01 12", - "1953-09-01 12", "1953-12-01 12", - "1954-09-01 12", "1954-12-01 12", - "1955-09-01 12", "1955-12-01 12" ; +******* + index +******* +The *index* variable has the same name as index_name parameter (e.g. +"FD"). It has the following attributes: - - -index -------- - -The *index* variable has the same name as index_name parameter (e.g. "FD"). -It has the following attributes: - - - long_name - - units - - _FillValue - - missing_value - - ( grid_mapping ) + - long_name + - units + - _FillValue + - missing_value + - ( grid_mapping ) Example: -.. code-block:: rest - - float FD(time, lat, lon) ; - FD:_FillValue = 1.e+20f ; - FD:long_name = "Frost days (minimum temperature < 0 degrees)" ; - FD:units = "days" ; - FD:missing_value = 1.e+20f ; - FD:standard_name = "ECA_index" ; - +.. code:: rest + float FD(time, lat, lon) ; + FD:_FillValue = 1.e+20f ; + FD:long_name = "Frost days (minimum temperature < 0 degrees)" ; + FD:units = "days" ; + FD:missing_value = 1.e+20f ; + FD:standard_name = "ECA_index" ; -.. note:: The *_FillValue* and *missing_value* are the same as in source files. +.. note:: + The *_FillValue* and *missing_value* are the same as in source files. -Global attributes ------------------- +******************* + Global attributes +******************* According to the CF convention, the output NetCDF file contains 6 main global attributes: - - title - - institution - - source - - history - - references - - comment + - title + - institution + - source + - history + - references + - comment Example: -.. code-block:: rest +.. code:: rest - // global attributes: - :title = "ECA cold index FD" ; - :institution = "Climate impact portal (https://climate4impact.eu)" ; - :source = ; - :references = "ATBD of the ECA indices calculation (https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf)" ; - :comment = " " ; - :history = "2011-04-07T06:39:36Z CMOR rewrote data to comply with CF standards and CMIP5 requirements. \n", - "2014-04-01 12:16:03 Calculation of FD index (monthly time series) from 1950-1-1 to 1955-12-31." ; + // global attributes: + :title = "ECA cold index FD" ; + :institution = "Climate impact portal (https://climate4impact.eu)" ; + :source = ; + :references = "ATBD of the ECA indices calculation (https://knmi-ecad-assets-prd.s3.amazonaws.com/documents/atbd.pdf)" ; + :comment = " " ; + :history = "2011-04-07T06:39:36Z CMOR rewrote data to comply with CF standards and CMIP5 requirements. \n", + "2014-04-01 12:16:03 Calculation of FD index (monthly time series) from 1950-1-1 to 1955-12-31." ; diff --git a/doc/source/references/release_notes.rst b/doc/source/references/release_notes.rst index 381cd0b8..ae39ed1b 100644 --- a/doc/source/references/release_notes.rst +++ b/doc/source/references/release_notes.rst @@ -1,284 +1,550 @@ -Release history -=============== - -6.6.0 (unreleased) ------------------- - -* [maint] Migrate from [black, blackdoc, flake8, isort, pyupgrade, pydocstyle] to ruff -* [maint] Migrate from setup.py to pyproject.toml -* [maint] Make readthedocs build fail when there are warnings -* [maint] Fix warnings in doc build -* [maint] Update architecture to have a `src/` and a `tests/` directory at root level -* [maint] Migrate build toolchain from setuptools to flit -* [maint] Remove version number from `constants` module as it was causing the build process to import icclim. - The version number is now statically set in src/icclim/__init__.py -* [maint] Lint code using the more restrictive rules from ruff -* [fix] Force xarray to read dataset sequentially to avoid a netcdf-c threading issue causing seg faults. -* [enh] Add `publish-to-pypi.yml` github action to automatically build and publish icclim to pypi. - This action is triggered by a github release being published. - This action requires a manual approval on github. - -6.5.0 ------ - -* [maint] Adapt generic indicators "excess" and "deficit" to xclim 0.45. -* [maint] Upgrade minimal python version to 3.9 -* [fix] Avoid resampling SPI* indices. - -6.4.0 ------ - -* [maint] Upgrade to xclim 0.43 (released on 09/05/2023). -* [maint] Change how xclim is pinned to allow patch changes. - - -6.3.0 ------ -* [maint] Upgrade to xclim 0.42 (released on 04/04/2023). -* [fix] **BREAKING CHANGE** The indicators based on the difference between two variables (ecad: DTR, ETR, vDTR and anomaly) gave wrong values due to a bad unit conversion of the output. - This was for example the case when the input variables are in Kelvin, the difference between the two variables is still in Kelvin but it cannot be converted to degree Celsius with the ususal `+273.15`. - To workaround this issue, we first convert inputs to the expected output unit and then we compute the index. -* [fix] **BREAKING CHANGE** Indices based on both a percentile threshold and a `threshold_min_value` (for ecad: r75p, r75pTOT, r95p, r95pTOT, r99p, r99pTOT) - are now computing the exceedance rate on values above `threshold_min_value` as well. Previously this `threshold_min_value` was used to compute the percentile and the total (for rxxpTOT indices) - but not the exceedance rate. - -6.2.0 ------ -* [maint] Upgrade and adapt to xclim 0.40. - Moved PercentileDataArray from xclim to icclim. - Adapted the unit cenversion to use the hydro context. -* [fix] Pin xclim to exact 0.40 to avoid breaking changes. - - -6.1.5 ------ -* [fix] Bug fix: not assuming longitude and latitude are lon and lat with respect to output metadata. Fix needed to work on E-OBS and other datasets. - -6.1.3 ------ -* [fix] Bug fix for TNx. - -6.1.2 ------ -* [fix] Add missing file to properly identify user_indices as a package. - -6.1.0 ------ -* [fix] Add unit getter/setter for BoundedThreshold. -* [enh] Add ECAD wind indices ``{fxx, fg6bft, fgcalm, fg, ddnorth, ddeast, ddsouth, ddwest}``. - `ddnorth` and `ddsouth` do not follow the ECAD's ATBD v11 requirements as their definition seems to be wrong in the document. -* [enh] Add generic indicators as stand-alone functions in `icclim` namespace. -* [doc] Add documentation for generic indicators stand-alone functions. -* [doc] Add a recipe "how to" documentation for generic indicators. -* [enh] Add ECAD's indices GSL, SPI3, SPI6 by binding them to xclim's indicators. -* [maint] Upgrade to xclim 0.39.0 - - -6.0.0 ------ -* [enh] Add generic indices -* [enh] Make in_files.var.threshold and threshold parameters work with string values (a value with a unit or a percentile stamp) -* [maint] **BREAKING CHANGE** ECAD indices are no longer configurable! Use generic indices instead. -* [fix] **BREAKING CHANGE** ECAD indices CW, CD, WW, WD were computing the precipitation percentiles on day of year - values where it should have been percentiles of the whole period (excluding dry days). This has been fixed. -* [maint] icclim no longer carries a version of the clix-meta yml file. - Previously it was used to generate the doc string and a few metadata of ECAD indices. - It's no longer needed as we have put these metadata within StandardIndex declaration. -* [maint] **BREAKING CHANGE** Removed the `clipped_season` option from `slice_mode`. - With generic indices, `season` can be used with every indices. - In particular, spell based indices (e.g. wsdi, cdd) are mapped to `max_consecutive_occurrence` or `sum_of_spell_lengths` - generic indicators. Both compute the spell length before doing the resampling operation. - So a spell that start and end outside the output frequency interval is properly accounted for its whole duration. - That's for example the case of `slice_mode="month"`, a spell that would start in january and end in March, - would be accounted in january results. - However, when `slice_mode` is set to a season, where time is clipped and thus where xclim `select_time` is called, - the behavior is similar to the former `clipped_season`, we first clip the time to the expected season, then we compute the index. - Thus, events of spells that are before the season bound will be ignored in the results. -* [maint] **BREAKING CHANGE** User index `max_nb_consecutive_events` is also mapped to `max_consecutive_occurrence`, consequently spells are also counted for their whole duration. -* [enh] Make it possible to pass a simple dictionary in `in_files`, merging together basic `in_files` and `var_name` features. - It looks like `in_files={"tasmax": "tasmax.nc", "tasmin": "tasmin.zarr"}` -* [enh] Add `min_spell_length` parameter to index API in order to control the minimum duration of spells in `sum_of_spell_lengths`. -* [enh] Add `rolling_window_width` parameter to index API in order to control the width of the rolling window in `max|min_of_rolling_sum|average`. -* [enh] Add `doy_window_width` parameter to index API in order to control the width of aggregation windows when computing doy percentiles. -* [maint] Deprecate `window_width` parameter. When filled, it is mapped to it is still mapped to `doy_window_width`. -* [maint] Upgrade to xclim 0.38 and to xarray 2022.6. -* [maint] Add BlackDoc to C.I (github actions) to keep or doc code example clean. -* [enh] Add ECAD's RR index. It computes the sum of precipitations over days. -* [enh] Add icclim logo and auto-update its inner version number. -* [maint] Enable git lfs (large file storage) for `.svg` files to minimise the impact on storage of logo updates. -* [enh] Improve icclim.indices to enable multi indices computation based on variable names `icclim.indices(index_group='tasmax',in_files=data)` -* [fix] **BREAKING CHANGE** ECAD snow indices now expect a snow (snd) variable instead of a precipitation one. -* [enh] Add `build_threshold` function that acts as a factory to create different kind of Threshold. -* [enh] Add BoundedThreshold class. It allows to compute multiple threshold for a single variable. - This feature is necessary for indices such as ECAD's "DDnorth". - Instances of BoundedThreshold are created with the `build_threshold` factory function, E.G. `build_threshold(">= -20 degree AND <= 20 degree ")` -* [enh] Make it possible to compute multiple percentiles at once. -* [maint] Update coverage computation. Now tests files are ignored when calculating the code coverage, thus it dropped a little (by 3%). -* [enh] Convert input data that are recognized as a precipitation amount into precipitation rate. - This is necessary to handle e-obs precipitation datasets. - -5.4.0 ------ -* [fix] When giving input as a list of netcdf files, the coordinate values are now merged using the `override` strategy, thus the first file with a given dimension define this dimension for all the files. -* [fix] Fix the output unit of some indices (from "°C" to "degree_Celsius") -* [fix] Fixed issued where dataset having a time_bds variable could not be processed by chunking the DataArray(s) instead of the Dataset. - -5.3.0 ------ -* [enh] Add icclim version to history in outputted metadata. -* [maint] **breaking change** Pin minimal pandas version to 1.3 to have the fix for https://github.com/pandas-dev/pandas/issues/24539 -* [enh] ``slice_mode``: seasons can now be defined to be between two exact dates. -* [enh] ``slice_mode`` type can now be tuple[str, list], it works similarly to the list in input of seasons but, it enforces a length of 2. -* [enh] ``slice_mode``: Added `clipped_season` keyword which ignores events starting before the season bounds (original behavior of ``season``). -* [maint] ``slice_mode``: Modified `season` keyword to take into account events (such as in CDD) starting before the season bounds. - This should improve the scientific validity of these seasonal computations. Plus it is in accordance to xclim way of doing this. -* [maint] Added dataclass ClimateIndex to ease the introduction of new indices not in the ECAD standard. -* [maint] Made use the new typing syntax thanks to ``from __future__ import annotations``. -* [maint] Add docstring validation into flake8 checks. -* [enh] Improve API for date related parameters ``{time_range, base_period_time_range, ref_time_range}`` - They can still be filled with a datetime object but additionally various string format are now available. - This comes with dateparser library. -* [doc] Update callback doc as its outputted value is very inaccurate when dask is enable. -* [enh] T(X/N/G)(10/90)p indices threshold is now configurable with `threshold` parameter. - Example of use: `icclim.tx90p(in_files=data, threshold=[42, 99])` -* [enh|maint] threshold, history and source metadata have been updated to better describe what happens during icclim process. -* [fix/doc] The documentation of the generated API for T(X/N/G)(10/90)p indices now properly use thier ECAD definitions instead of those from ETCCDI. -* [enh/doc] Add [WSDI, CSDI, rxxp, rxxpTOT, CW, CD, WW, WD] indices in yaml definition. - Note: We no longer strictly follow the yaml given by clix-meta. -* [fix] custom seasonal slice_mode was broken when it ended in december. It's now fixed and unit tested. -* [enh] Make ``in_file`` accept a dictionary merging together ``var_name`` and ``in_file`` features. -* [enh] ``in_file`` dictionary can now be used to pass percentiles thresholds. These thresholds will be used instead of computing them on relevant indices. -* [maint/internal] Refactored IndexConfig and moved all the logic to input_parsing. -* [fix] Add auto detection of variables [prAdjust, tasAdjust, tasmaxAdjust, tasminAdjust] - -5.2.2 ------ -[maint] Remove constraint on numpy version as numba is now working with np 1.22. - -5.2.1 ------ -* [maint] Made Frequency part of SliceMode union. -* [fix] slice_mode seasonal samplings was giving wrong results for quite a few indices. This has been fixed and the performances should also be improved by the fix. - However, now seasonal slice_mode does not allow to use xclim missing values mechanisms. -* [fix] user_index ExtremeMode config was not properly parsed when a string was used. -* [fix] user_index Anomaly operator was not properly using the `ref_time_range` to setup a reference period as it should. -* [fix] user_index Sum and Mean operators were broken due to a previous refactoring and a lack of unit tests, it is now fixed and tested. -* [maint] Changed how `rechunker` dependency is pinned to add flexibility. We want a version above '0.3' but not the '0.4'. -* [maint] For the newly generate API, on `custom_index` function, the parameter `user_index` is now mandatory. - - -5.2.0 ------ -* [maint] Update release process. -* [enh] Improve `create_optimized_zarr_store` to accept a chunking schema instead of a single dim. -* [enh] Make use of `fsspec` to generalize the storages where `create_optimized_zarr_store` can create its zarr stores. -* [enh] Make CSDI and WSDI threshold configurable using the `threshold` parameter of icclim.index. -* [enh] Add a function in `icclim` namespace for each ECA&D index for convenience. -* [doc] Improve documentation about chunking. -* [fix] `create_optimized_zarr_store` would throw an error when creating the first temp store if the chunks were not unified. - -5.1.0 ------ -* [maint] **BREAKING CHANGE** Parameter ``out_file`` of icclim.index default value is now ``None``. When None, ``icclim.index`` only returns a xarray.Dataset and does not write to a default netcdf file. -* [enh] Add code coverage in CI. This writes a comment with the full report in the PR. -* [enh] Add coverage and conda badges in Readme. -* [tst] Add unit test for modules ``main``, ``dispatcher``, ``cf_calendar``. -* [fix] Rework ``cf_calendar`` following unit test writing. -* [tst] Add simple integration test for ``icclim.index`` using index "SU". -* [maint] Remove old, unmaintained integration tests and auxiliary tools. See `9ac35c2f`_ for details. -* [maint] Upgrade to xclim 0.34. -* [fix] WSDI and CSDI percentile were computed on the studied period instead of the reference period. -* [maint] Internal refactoring ``icclim.main`` module to ease maintainability. -* [doc] Add contribution guide. -* [enh] Add API endpoint ``icclim.create_optimized_zarr_store``. It is a context manager wrapping `rechunker` in order to rechunk a dataset without any chunk a given `dim` dimension. -* [fix] Add zarr dependency, needed to update zarr store metadata after rechunking. -* [fix] Fix installation from sources. The import in setup.py to get ``__version__`` meant we needed to have the whole environment installed before the moment it is actually installed by ``setup.py``. -* [enh] Add API endpoint ``icclim.indices``. This allows to compute multiple indices at once. -* [maint] Pin `dask` to its versions before `2022.01.1`. This is necessary for rechunker 0.3.3 to work. -* [maint] Update types to use modern python typing syntax. -* [fix] CI was passing even when tests were in failure. This has been fixed. - -.. _`9ac35c2f`: https://github.com/cerfacs-globc/icclim/commit/9ac35c2f7bda76b26427fd433a79f7b4334776e7 - -5.0.2 ------ -* [fix] Update extracting script for C3S. imports were broken. -* [doc] Update release process doc. -* [fix] Bug on windows breaking unit tests. -* [fix] Bug on windows unable to get the timezone in our logger. -* [fix] Pin to numpy 1.21 for now. Numba seems to dislike version 1.22 -* [fix] LICENCE was still not exactly following Apache guidelines. NOTICE has been removed. - - -5.0.1 ------ -* [fix] Modify LICENCE and NOTICE to follow Apache guidelines. LICENCE has also been renamed to english LICENSE. - - -5.0.0 ------ -We fully rewrote icclim to benefit from Xclim, Xarray, Numpy and Dask. -A lot of effort has been to minimize the API changes. -Thus for all scripts using a former version of icclim updating to this new version should be smooth. +################# + Release history +################# + +******************** + 6.6.0 (unreleased) +******************** + +- [maint] Migrate from [black, blackdoc, flake8, isort, pyupgrade, + pydocstyle] to ruff + +- [maint] Migrate from setup.py to pyproject.toml + +- [maint] Make readthedocs build fail when there are warnings + +- [maint] Fix warnings in doc build + +- [maint] Update architecture to have a `src/` and a `tests/` directory + at root level + +- [maint] Migrate build toolchain from setuptools to flit + +- [maint] Remove version number from `constants` module as it was + causing the build process to import icclim. The version number is now + statically set in src/icclim/__init__.py + +- [maint] Lint code using the more restrictive rules from ruff + +- [fix] Force xarray to read dataset sequentially to avoid a netcdf-c + threading issue causing seg faults. + +- [enh] Add `publish-to-pypi.yml` github action to automatically build + and publish icclim to pypi. This action is triggered by a github + release being published. This action requires a manual approval on + github. + +******* + 6.5.0 +******* + +- [maint] Adapt generic indicators "excess" and "deficit" to xclim + 0.45. +- [maint] Upgrade minimal python version to 3.9 +- [fix] Avoid resampling SPI* indices. + +******* + 6.4.0 +******* + +- [maint] Upgrade to xclim 0.43 (released on 09/05/2023). +- [maint] Change how xclim is pinned to allow patch changes. + +******* + 6.3.0 +******* + +- [maint] Upgrade to xclim 0.42 (released on 04/04/2023). + +- [fix] **BREAKING CHANGE** The indicators based on the difference + between two variables (ecad: DTR, ETR, vDTR and anomaly) gave wrong + values due to a bad unit conversion of the output. This was for + example the case when the input variables are in Kelvin, the + difference between the two variables is still in Kelvin but it cannot + be converted to degree Celsius with the ususal `+273.15`. To + workaround this issue, we first convert inputs to the expected output + unit and then we compute the index. + +- [fix] **BREAKING CHANGE** Indices based on both a percentile + threshold and a `threshold_min_value` (for ecad: r75p, r75pTOT, r95p, + r95pTOT, r99p, r99pTOT) are now computing the exceedance rate on + values above `threshold_min_value` as well. Previously this + `threshold_min_value` was used to compute the percentile and the + total (for rxxpTOT indices) but not the exceedance rate. + +******* + 6.2.0 +******* + +- [maint] Upgrade and adapt to xclim 0.40. Moved PercentileDataArray + from xclim to icclim. Adapted the unit cenversion to use the hydro + context. + +- [fix] Pin xclim to exact 0.40 to avoid breaking changes. + +******* + 6.1.5 +******* + +- [fix] Bug fix: not assuming longitude and latitude are lon and lat + with respect to output metadata. Fix needed to work on E-OBS and + other datasets. + +******* + 6.1.3 +******* + +- [fix] Bug fix for TNx. + +******* + 6.1.2 +******* + +- [fix] Add missing file to properly identify user_indices as a + package. + +******* + 6.1.0 +******* + +- [fix] Add unit getter/setter for BoundedThreshold. + +- [enh] Add ECAD wind indices ``{fxx, fg6bft, fgcalm, fg, ddnorth, + ddeast, ddsouth, ddwest}``. `ddnorth` and `ddsouth` do not follow the + ECAD's ATBD v11 requirements as their definition seems to be wrong in + the document. + +- [enh] Add generic indicators as stand-alone functions in `icclim` + namespace. + +- [doc] Add documentation for generic indicators stand-alone functions. + +- [doc] Add a recipe "how to" documentation for generic indicators. + +- [enh] Add ECAD's indices GSL, SPI3, SPI6 by binding them to xclim's + indicators. + +- [maint] Upgrade to xclim 0.39.0 + +******* + 6.0.0 +******* + +- [enh] Add generic indices + +- [enh] Make in_files.var.threshold and threshold parameters work with + string values (a value with a unit or a percentile stamp) + +- [maint] **BREAKING CHANGE** ECAD indices are no longer configurable! + Use generic indices instead. + +- [fix] **BREAKING CHANGE** ECAD indices CW, CD, WW, WD were computing + the precipitation percentiles on day of year values where it should + have been percentiles of the whole period (excluding dry days). This + has been fixed. + +- [maint] icclim no longer carries a version of the clix-meta yml file. + Previously it was used to generate the doc string and a few metadata + of ECAD indices. It's no longer needed as we have put these metadata + within StandardIndex declaration. + +- [maint] **BREAKING CHANGE** Removed the `clipped_season` option from + `slice_mode`. With generic indices, `season` can be used with every + indices. In particular, spell based indices (e.g. wsdi, cdd) are + mapped to `max_consecutive_occurrence` or `sum_of_spell_lengths` + generic indicators. Both compute the spell length before doing the + resampling operation. So a spell that start and end outside the + output frequency interval is properly accounted for its whole + duration. That's for example the case of `slice_mode="month"`, a + spell that would start in january and end in March, would be + accounted in january results. However, when `slice_mode` is set to a + season, where time is clipped and thus where xclim `select_time` is + called, the behavior is similar to the former `clipped_season`, we + first clip the time to the expected season, then we compute the + index. Thus, events of spells that are before the season bound will + be ignored in the results. + +- [maint] **BREAKING CHANGE** User index `max_nb_consecutive_events` is + also mapped to `max_consecutive_occurrence`, consequently spells are + also counted for their whole duration. + +- [enh] Make it possible to pass a simple dictionary in `in_files`, + merging together basic `in_files` and `var_name` features. It looks + like `in_files={"tasmax": "tasmax.nc", "tasmin": "tasmin.zarr"}` + +- [enh] Add `min_spell_length` parameter to index API in order to + control the minimum duration of spells in `sum_of_spell_lengths`. + +- [enh] Add `rolling_window_width` parameter to index API in order to + control the width of the rolling window in + `max|min_of_rolling_sum|average`. + +- [enh] Add `doy_window_width` parameter to index API in order to + control the width of aggregation windows when computing doy + percentiles. + +- [maint] Deprecate `window_width` parameter. When filled, it is mapped + to it is still mapped to `doy_window_width`. + +- [maint] Upgrade to xclim 0.38 and to xarray 2022.6. + +- [maint] Add BlackDoc to C.I (github actions) to keep or doc code + example clean. + +- [enh] Add ECAD's RR index. It computes the sum of precipitations over + days. + +- [enh] Add icclim logo and auto-update its inner version number. + +- [maint] Enable git lfs (large file storage) for `.svg` files to + minimise the impact on storage of logo updates. + +- [enh] Improve icclim.indices to enable multi indices computation + based on variable names + `icclim.indices(index_group='tasmax',in_files=data)` + +- [fix] **BREAKING CHANGE** ECAD snow indices now expect a snow (snd) + variable instead of a precipitation one. + +- [enh] Add `build_threshold` function that acts as a factory to create + different kind of Threshold. + +- [enh] Add BoundedThreshold class. It allows to compute multiple + threshold for a single variable. This feature is necessary for + indices such as ECAD's "DDnorth". Instances of BoundedThreshold are + created with the `build_threshold` factory function, E.G. + `build_threshold(">= -20 degree AND <= 20 degree ")` + +- [enh] Make it possible to compute multiple percentiles at once. + +- [maint] Update coverage computation. Now tests files are ignored when + calculating the code coverage, thus it dropped a little (by 3%). + +- [enh] Convert input data that are recognized as a precipitation + amount into precipitation rate. This is necessary to handle e-obs + precipitation datasets. + +******* + 5.4.0 +******* + +- [fix] When giving input as a list of netcdf files, the coordinate + values are now merged using the `override` strategy, thus the first + file with a given dimension define this dimension for all the files. + +- [fix] Fix the output unit of some indices (from "°C" to + "degree_Celsius") + +- [fix] Fixed issued where dataset having a time_bds variable could not + be processed by chunking the DataArray(s) instead of the Dataset. + +******* + 5.3.0 +******* + +- [enh] Add icclim version to history in outputted metadata. + +- [maint] **breaking change** Pin minimal pandas version to 1.3 to have + the fix for https://github.com/pandas-dev/pandas/issues/24539 + +- [enh] ``slice_mode``: seasons can now be defined to be between two + exact dates. + +- [enh] ``slice_mode`` type can now be tuple[str, list], it works + similarly to the list in input of seasons but, it enforces a length + of 2. + +- [enh] ``slice_mode``: Added `clipped_season` keyword which ignores + events starting before the season bounds (original behavior of + ``season``). + +- [maint] ``slice_mode``: Modified `season` keyword to take into + account events (such as in CDD) starting before the season bounds. + This should improve the scientific validity of these seasonal + computations. Plus it is in accordance to xclim way of doing this. + +- [maint] Added dataclass ClimateIndex to ease the introduction of new + indices not in the ECAD standard. + +- [maint] Made use the new typing syntax thanks to ``from __future__ + import annotations``. + +- [maint] Add docstring validation into flake8 checks. + +- [enh] Improve API for date related parameters ``{time_range, + base_period_time_range, ref_time_range}`` They can still be filled + with a datetime object but additionally various string format are now + available. This comes with dateparser library. + +- [doc] Update callback doc as its outputted value is very inaccurate + when dask is enable. + +- [enh] T(X/N/G)(10/90)p indices threshold is now configurable with + `threshold` parameter. Example of use: `icclim.tx90p(in_files=data, + threshold=[42, 99])` + +- [enh|maint] threshold, history and source metadata have been updated + to better describe what happens during icclim process. + +- [fix/doc] The documentation of the generated API for T(X/N/G)(10/90)p + indices now properly use thier ECAD definitions instead of those from + ETCCDI. + +- [enh/doc] Add [WSDI, CSDI, rxxp, rxxpTOT, CW, CD, WW, WD] indices in + yaml definition. Note: We no longer strictly follow the yaml given by + clix-meta. + +- [fix] custom seasonal slice_mode was broken when it ended in + december. It's now fixed and unit tested. + +- [enh] Make ``in_file`` accept a dictionary merging together + ``var_name`` and ``in_file`` features. + +- [enh] ``in_file`` dictionary can now be used to pass percentiles + thresholds. These thresholds will be used instead of computing them + on relevant indices. + +- [maint/internal] Refactored IndexConfig and moved all the logic to + input_parsing. + +- [fix] Add auto detection of variables [prAdjust, tasAdjust, + tasmaxAdjust, tasminAdjust] + +******* + 5.2.2 +******* + +[maint] Remove constraint on numpy version as numba is now working with +np 1.22. + +******* + 5.2.1 +******* + +- [maint] Made Frequency part of SliceMode union. + +- [fix] slice_mode seasonal samplings was giving wrong results for + quite a few indices. This has been fixed and the performances should + also be improved by the fix. However, now seasonal slice_mode does + not allow to use xclim missing values mechanisms. + +- [fix] user_index ExtremeMode config was not properly parsed when a + string was used. + +- [fix] user_index Anomaly operator was not properly using the + `ref_time_range` to setup a reference period as it should. + +- [fix] user_index Sum and Mean operators were broken due to a previous + refactoring and a lack of unit tests, it is now fixed and tested. + +- [maint] Changed how `rechunker` dependency is pinned to add + flexibility. We want a version above '0.3' but not the '0.4'. + +- [maint] For the newly generate API, on `custom_index` function, the + parameter `user_index` is now mandatory. + +******* + 5.2.0 +******* + +- [maint] Update release process. +- [enh] Improve `create_optimized_zarr_store` to accept a chunking + schema instead of a single dim. +- [enh] Make use of `fsspec` to generalize the storages where + `create_optimized_zarr_store` can create its zarr stores. +- [enh] Make CSDI and WSDI threshold configurable using the `threshold` + parameter of icclim.index. +- [enh] Add a function in `icclim` namespace for each ECA&D index for + convenience. +- [doc] Improve documentation about chunking. +- [fix] `create_optimized_zarr_store` would throw an error when + creating the first temp store if the chunks were not unified. + +******* + 5.1.0 +******* + +- [maint] **BREAKING CHANGE** Parameter ``out_file`` of icclim.index + default value is now ``None``. When None, ``icclim.index`` only + returns a xarray.Dataset and does not write to a default netcdf file. + +- [enh] Add code coverage in CI. This writes a comment with the full + report in the PR. + +- [enh] Add coverage and conda badges in Readme. + +- [tst] Add unit test for modules ``main``, ``dispatcher``, + ``cf_calendar``. + +- [fix] Rework ``cf_calendar`` following unit test writing. + +- [tst] Add simple integration test for ``icclim.index`` using index + "SU". + +- [maint] Remove old, unmaintained integration tests and auxiliary + tools. See 9ac35c2f_ for details. + +- [maint] Upgrade to xclim 0.34. + +- [fix] WSDI and CSDI percentile were computed on the studied period + instead of the reference period. + +- [maint] Internal refactoring ``icclim.main`` module to ease + maintainability. + +- [doc] Add contribution guide. + +- [enh] Add API endpoint ``icclim.create_optimized_zarr_store``. It is + a context manager wrapping `rechunker` in order to rechunk a dataset + without any chunk a given `dim` dimension. + +- [fix] Add zarr dependency, needed to update zarr store metadata after + rechunking. + +- [fix] Fix installation from sources. The import in setup.py to get + ``__version__`` meant we needed to have the whole environment + installed before the moment it is actually installed by ``setup.py``. + +- [enh] Add API endpoint ``icclim.indices``. This allows to compute + multiple indices at once. + +- [maint] Pin `dask` to its versions before `2022.01.1`. This is + necessary for rechunker 0.3.3 to work. + +- [maint] Update types to use modern python typing syntax. + +- [fix] CI was passing even when tests were in failure. This has been + fixed. + +.. _9ac35c2f: https://github.com/cerfacs-globc/icclim/commit/9ac35c2f7bda76b26427fd433a79f7b4334776e7 + +******* + 5.0.2 +******* + +- [fix] Update extracting script for C3S. imports were broken. +- [doc] Update release process doc. +- [fix] Bug on windows breaking unit tests. +- [fix] Bug on windows unable to get the timezone in our logger. +- [fix] Pin to numpy 1.21 for now. Numba seems to dislike version 1.22 +- [fix] LICENCE was still not exactly following Apache guidelines. + NOTICE has been removed. + +******* + 5.0.1 +******* + +- [fix] Modify LICENCE and NOTICE to follow Apache guidelines. LICENCE + has also been renamed to english LICENSE. + +******* + 5.0.0 +******* + +We fully rewrote icclim to benefit from Xclim, Xarray, Numpy and Dask. A +lot of effort has been to minimize the API changes. Thus for all scripts +using a former version of icclim updating to this new version should be +smooth. We made a few improvements on the API - - We replaced everywhere the french singular word "indice" by the proper english "index". You should get a warning if you still use "indice" such as in "indice_name". - - When ``save_percentile`` is used, the resulting percentiles are saved within the same netcdf file as the climate index. - - Most of the keywords (such as slice_mode, index_name, are now case insensitive to avoid unnecessary errors. - - When ``in_files`` is a list the netcdf are combined to lookup them all the necessary variables. - - When multiple variables are stored into a single ``in_files``, there is no more need to use a list. - - ``in_files`` parameter can now be a Xarray.Dataset directly. In that case, ``out_file`` is ignored. - - ``var_name`` parameter is now optional for ECA&D indices, icclim will try to look for a valid variable depending on the index wanted - - ``transfer_limit_Mbytes`` parameter is now used to adjust how Dask should chunk the dataset. - - The output of ``icclim.index()`` is now the resulting Xarray Dataset of the index computation. ``out_file`` can still be used to write output to a netcdf. - - `logs_verbosity` parameter can now control how much logs icclim will produce. The possible values are ``{"HIGH", "LOW", "SILENT"}``. + - We replaced everywhere the french singular word "indice" by the + proper english "index". You should get a warning if you still use + "indice" such as in "indice_name". + + - When ``save_percentile`` is used, the resulting percentiles are + saved within the same netcdf file as the climate index. + + - Most of the keywords (such as slice_mode, index_name, are now case + insensitive to avoid unnecessary errors. + + - When ``in_files`` is a list the netcdf are combined to lookup them + all the necessary variables. + + - When multiple variables are stored into a single ``in_files``, + there is no more need to use a list. + + - ``in_files`` parameter can now be a Xarray.Dataset directly. In + that case, ``out_file`` is ignored. + + - ``var_name`` parameter is now optional for ECA&D indices, icclim + will try to look for a valid variable depending on the index + wanted + + - ``transfer_limit_Mbytes`` parameter is now used to adjust how Dask + should chunk the dataset. + + - The output of ``icclim.index()`` is now the resulting Xarray + Dataset of the index computation. ``out_file`` can still be used + to write output to a netcdf. + + - `logs_verbosity` parameter can now control how much logs icclim + will produce. The possible values are ``{"HIGH", "LOW", + "SILENT"}``. Additionally - - icclim C code has also been removed. This makes the installation and maintenance much easier. - - Climate indices metadata has been enriched with Xclim metadata. - - With this rewrite a few indices were fixed as they were giving improper results. - - Performances have been significantly improved, especially thanks to Dask. + - icclim C code has also been removed. This makes the installation + and maintenance much easier. + - Climate indices metadata has been enriched with Xclim metadata. + - With this rewrite a few indices were fixed as they were giving + improper results. + - Performances have been significantly improved, especially thanks + to Dask. Breaking changes -~~~~~~~~~~~~~~~~ -Some utility features of icclim has been removed in 5.0.0. -This include `util.regrid` module as well as `util.spatial_stat` module. -For regridding, users are encouraged to try `xESMF `_ or to use xarray -selection directly. -For spatial stats, Xarray provides a `DataArrayWeighted `_ +================ + +Some utility features of icclim has been removed in 5.0.0. This include +`util.regrid` module as well as `util.spatial_stat` module. For +regridding, users are encouraged to try `xESMF +`_ or to use xarray +selection directly. For spatial stats, Xarray provides a +`DataArrayWeighted +`_ .. note:: - It is highly recommended to use Dask (eventually with the distributed scheduler) to fully benefit from the performance - improvements of version 5.0.0. + It is highly recommended to use Dask (eventually with the distributed + scheduler) to fully benefit from the performance improvements of + version 5.0.0. Release candidates for 5.0 change logs -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* [fix] Make HD17 expect tas instead of tas_min. -* [fix] Fix performance issue with indices computed on consecutive days such as CDD. -* [maint] Add Github action CI to run unit tests. -* [maint] Add pre-commit CI to fix lint issues on PRs. -* [maint] Update sphinx and remove old static files. -* [doc] Restructure documentation to follow diataxis principles. -* [doc] Add some articles to documentation. -* [maint] Drop support for python 3.7 -* [maint] Add github templates for issues and pull requests. -* [maint] Simplify ecad functions output to a single DataArray in most cases. -* [fix] Fix lint for doc conf. -* [fix] Add all requirements to requirements_dev.txt -* [doc] Update Readme from md to rst format. Also changed content. -* [doc] Add a dev documentation article "how to release". -* [doc] Add a dev documentation article "continuous integration". -* [doc] Update installation tutorial. -* [doc] Various improvements in doc wording and display. -* [doc] Start to documente ECA&D indices functions. -* [doc] Add article to distinguish icclim from xclim. -* [maint] Refactored ecad_functions (removed duplicated code, simplified function signatures...) -* [maint] Refactored IndexConfig to hide some technical knowledge which was leaked to other modules. -* [enh] Made a basic integration of clix-meta yaml to populate the generated docstring for c3s. -* [maint] This makes pyyaml a required dependency of icclim. -* [fix] Fixed an issue with aliasing of "icclim" module and "icclim" package -* [maint] Added some metadata to qualify the ecad_indices and recognize the arguments necessary to compute them. -* [maint] Added readthedocs CI configuration. This is necessary to use python 3.8. -* [enh] Added `tools/extract-icclim-funs.py` script to extract from icclim stand-alone function for each indices. -* [enh] Added `icclim.indices` function (notice plural) to list the available indices. +====================================== + +- [fix] Make HD17 expect tas instead of tas_min. +- [fix] Fix performance issue with indices computed on consecutive days + such as CDD. +- [maint] Add Github action CI to run unit tests. +- [maint] Add pre-commit CI to fix lint issues on PRs. +- [maint] Update sphinx and remove old static files. +- [doc] Restructure documentation to follow diataxis principles. +- [doc] Add some articles to documentation. +- [maint] Drop support for python 3.7 +- [maint] Add github templates for issues and pull requests. +- [maint] Simplify ecad functions output to a single DataArray in most + cases. +- [fix] Fix lint for doc conf. +- [fix] Add all requirements to requirements_dev.txt +- [doc] Update Readme from md to rst format. Also changed content. +- [doc] Add a dev documentation article "how to release". +- [doc] Add a dev documentation article "continuous integration". +- [doc] Update installation tutorial. +- [doc] Various improvements in doc wording and display. +- [doc] Start to documente ECA&D indices functions. +- [doc] Add article to distinguish icclim from xclim. +- [maint] Refactored ecad_functions (removed duplicated code, + simplified function signatures...) +- [maint] Refactored IndexConfig to hide some technical knowledge which + was leaked to other modules. +- [enh] Made a basic integration of clix-meta yaml to populate the + generated docstring for c3s. +- [maint] This makes pyyaml a required dependency of icclim. +- [fix] Fixed an issue with aliasing of "icclim" module and "icclim" + package +- [maint] Added some metadata to qualify the ecad_indices and recognize + the arguments necessary to compute them. +- [maint] Added readthedocs CI configuration. This is necessary to use + python 3.8. +- [enh] Added `tools/extract-icclim-funs.py` script to extract from + icclim stand-alone function for each indices. +- [enh] Added `icclim.indices` function (notice plural) to list the + available indices. diff --git a/doc/source/references/threshold.rst b/doc/source/references/threshold.rst index c42eca71..c3059446 100644 --- a/doc/source/references/threshold.rst +++ b/doc/source/references/threshold.rst @@ -1,5 +1,6 @@ +######### Threshold -========= +######### .. automodule:: icclim.generic_indices.threshold :members: diff --git a/doc/source/tutorials/index.rst b/doc/source/tutorials/index.rst index d5e98b1f..3573b6c9 100644 --- a/doc/source/tutorials/index.rst +++ b/doc/source/tutorials/index.rst @@ -1,11 +1,13 @@ -Tutorials -========= +########### + Tutorials +########### -These documents should serve as a way to discover icclim and it's capabilities. -To see how icclim can solve specific issues see :ref:`how_to`. +These documents should serve as a way to discover icclim and it's +capabilities. To see how icclim can solve specific issues see +:ref:`how_to`. .. toctree:: - :maxdepth: 2 - :caption: Contents: + :maxdepth: 2 + :caption: Contents: - installation + installation diff --git a/doc/source/tutorials/installation.rst b/doc/source/tutorials/installation.rst index 8cfe4281..72c1c7b0 100644 --- a/doc/source/tutorials/installation.rst +++ b/doc/source/tutorials/installation.rst @@ -1,63 +1,71 @@ -Installation -============ +############## + Installation +############## +************** + Dependencies +************** -Dependencies ------------- -The dependencies to run icclim are listed under our -`requirements.txt `_ file. +The dependencies to run icclim are listed under our `requirements.txt +`_ +file. -Installation (Linux, OS X) --------------------------- -.. note:: Make sure you have **Python 3.9+**. +**************************** + Installation (Linux, OS X) +**************************** -To install from pip -~~~~~~~~~~~~~~~~~~~ +.. note:: + + Make sure you have **Python 3.9+**. -.. code-block:: sh +To install from pip +=================== - pip install icclim +.. code:: sh + pip install icclim To install from sources -~~~~~~~~~~~~~~~~~~~~~~~ +======================= With git: -1. ``git clone git://github.com/cerfacs-globc/icclim`` -2. ``cd icclim`` +#. ``git clone git://github.com/cerfacs-globc/icclim`` +#. ``cd icclim`` Or without git: -1. Go to ``_. -2. you can download the last release: click to **Source code (zip)** or **Source code (tar.gz)**. -3. Extract the file. -4. Go to extracted directory. +#. Go to https://github.com/cerfacs-globc/icclim/releases/. +#. you can download the last release: click to **Source code (zip)** or + **Source code (tar.gz)**. +#. Extract the file. +#. Go to extracted directory. Then run the following commands: -.. code-block:: sh +.. code:: sh - [sudo] python setup.py install + [sudo] python setup.py install or if you don't have root or sudo access, as a normal user: -.. code-block:: sh +.. code:: sh - python setup.py install --user + python setup.py install --user 6. Check if the library is installed correctly: -.. code-block:: sh +.. code:: sh - >>> import icclim + >>> import icclim To get the version of installed library, do the following: -.. code-block:: sh +.. code:: sh - >>> icclim.__version__ - 5.0.0 + >>> icclim.__version__ + 5.0.0 +.. note:: -.. note:: icclim was not tested on Windows platform... + icclim was not tested on Windows platform... diff --git a/pyproject.toml b/pyproject.toml index 64903411..f2d1f9f5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -54,7 +54,8 @@ dev = [ "flit", "ruff", "pip", - "pre-commit>=2.9" + "pre-commit>=2.9", + "sphinx-autobuild" ] doc = [ "sphinx",