How to deal with obs and simh if they have different time lengths? #65

Pan-Yuxian · 2024-03-08T03:27:11Z

Pan-Yuxian
Mar 8, 2024

I'd like to first express my gratitude to you for building this fancy package.
When I used this package (for example, the QM method) and the provided tas dataArray, I found that the defined obs and simh should be consistent and have the same time lengths. However, I have an obs with 30 years and simh with 20 years. So I try to call quantile_mapping() directly, and the code is like this:

variable                   = "tas"  
kwargs                     = {"n_quantiles":2000, "kind":"+"}
in_obs                     = obsh[variable]
in_simh                    = simh[variable][3650:,:,:]
in_simp                    = simh[variable] #!!!please note that I use the simh for simp
qm_adjusted          = xr.apply_ufunc(cmethods_dis.quantile_mapping,
                                            in_obs,
                                            in_simh.rename({"time": "t1"}),
                                            in_simp.rename({"time": "t2"}),
                                            dask              = "parallelized",
                                            vectorize         = True,
                                            input_core_dims   = [["time"], ["t1"], ["t2"]],
                                            output_core_dims  = [["t2"]],
                                            kwargs            = dict(kwargs)
                                           )
qm_adjusted           = qm_adjusted .rename({"t2": "time"}).transpose(*in_obs .dims)

then I plot the results:

plt.figure(figsize=(10,5),dpi=216)
in_obs.groupby("time.dayofyear").mean(...).plot(label="$T_{obs,h}$",color="black")
in_simh.groupby("time.dayofyear").mean(...).plot(label="$T_{sim,h}$",color="blue")
in_simp.groupby("time.dayofyear").mean(...).plot(label="$T_{sim,p}$",color="red")
(qm_adjusted.to_dataset())["tas"].groupby("time.dayofyear").mean(...).plot(label="$T^{QM}_{sim,p}$",color="green")

plt.title("Historical modeled and obseved temperatures; and corrected temperatures")
plt.xlim(0,365)
plt.gca().grid(alpha=.3)
plt.legend();

I got this figure and I think it is strange:

so could you please help me to explain and solve this problem?

Answered by btschwertfeger

Mar 8, 2024

AH - yes, I have seen my fault. I have found the reason for this. It was a bit tricky. The problem is that the cumulative distribution functions for obs and simh will have the length of n_quantiles +1 BUT because of the different lengths of these data sets, they dont count up to the same number. Thus when having less values for simh thant for obs, we will always miss high values.

import xarray as xr
import matplotlib.pyplot as plt 
import numpy as np
from cmethods.utils import get_cdf, get_inverse_of_cdf

obs = xr.open_dataset("examples/input_data/observations.nc")["tas"]
simp = xr.open_dataset("examples/input_data/control.nc")["tas"]
simh = simp.copy(deep=True)[3650:]
n_quantiles = 2000

o…

View full answer

btschwertfeger · 2024-03-08T10:20:28Z

btschwertfeger
Mar 8, 2024
Maintainer

Hey @Pan-Yuxian, thank you!

I'm not sure about your data, about have you tried to use less quantiles? If you have 30 years of I assume "daily" data, i.e. 10.950 days in total, having 2000 quantiles, means 2000 bins in which these 10 K values get assigned. That can be a problem as QM uses interpolation (see https://python-cmethods.readthedocs.io/en/latest/src/methods.html#quantile-mapping).

I tried to reproduce it on some data of mine, but was not able to plot even something that comes into the direction of yours. Could you provide your obs and simh data sets to contact@b-schwertfeger.de?

These are my tries, for example using the data sets provided in this repository:

from cmethods import adjust
from cmethods.distribution import quantile_mapping
import xarray as xr
import matplotlib.pyplot as plt 

obs = xr.open_dataset("examples/input_data/observations.nc")["tas"]
simp = xr.open_dataset("examples/input_data/control.nc")["tas"]
simh = simp.copy(deep=True)[3650:]

#bc = adjust(
#     method="quantile_mapping",
#     obs=obs,
#     simh=simh,
#     simp=simh,
#     n_quantiles=200,
#)

kwargs = {"n_quantiles":2000, "kind":"+"}
qm_adjusted = xr.apply_ufunc(quantile_mapping,
    obs,
    simh.rename({"time": "t1"}),
    simp.rename({"time": "t2"}),
    dask              = "parallelized",
    vectorize         = True,
    input_core_dims   = [["time"], ["t1"], ["t2"]],
    output_core_dims  = [["t2"]],
    kwargs            = dict(kwargs)
)
qm_adjusted  = qm_adjusted.rename({"t2": "time"}).transpose(*obs.dims)

plt.figure(figsize=(10,5),dpi=216)
obs.groupby("time.dayofyear").mean(...).plot(label="$T_{obs,h}$",color="black")
simh.groupby("time.dayofyear").mean(...).plot(label="$T_{sim,h}$",color="blue")
simp.groupby("time.dayofyear").mean(...).plot(label="$T_{sim,p}$",color="red")
bc["tas"].groupby("time.dayofyear").mean(...).plot(label="$T^{QM}_{sim,p}$",color="green")

plt.title("Historical modeled and obseved temperatures; and corrected temperatures")
plt.xlim(0,365)
plt.gca().grid(alpha=.3)
plt.legend();

Also thanks a lot for bringing my attention to the restriction to equally sized input data for the control period. I will adjust this part to avoid implementing workarounds like yours.

2 replies

Pan-Yuxian Mar 8, 2024
Author

Hi @btschwertfeger, thanks for your reply!
Actually, I used the data sets provided in your repository here for presentation.
I found that in the code you provided here using "bc["tas"]" to plot $T^{QM}_{sim,p}$, however, if you try this:

qm_adjusted.groupby("time.dayofyear").mean(...).plot(label="$T^{QM}_{sim,p}$",color="green")

you may get an output similar to mine.

I will try to use obs and simh with equal lengths, and I am looking forward to the update method~ 😊

btschwertfeger Mar 8, 2024
Maintainer

AH - yes, I have seen my fault. I have found the reason for this. It was a bit tricky. The problem is that the cumulative distribution functions for obs and simh will have the length of n_quantiles +1 BUT because of the different lengths of these data sets, they dont count up to the same number. Thus when having less values for simh thant for obs, we will always miss high values.

import xarray as xr
import matplotlib.pyplot as plt 
import numpy as np
from cmethods.utils import get_cdf, get_inverse_of_cdf

obs = xr.open_dataset("examples/input_data/observations.nc")["tas"]
simp = xr.open_dataset("examples/input_data/control.nc")["tas"]
simh = simp.copy(deep=True)[3650:]
n_quantiles = 2000

obs, simh, simp = np.array(obs), np.array(simh), np.array(simp)
global_max = max(np.nanmax(obs), np.nanmax(simh))
global_min = min(np.nanmin(obs), np.nanmin(simh))

wide = abs(global_max - global_min) / n_quantiles
xbins = np.arange(global_min, global_max + wide, wide)
cdf_obs = get_cdf(obs, xbins)
cdf_simh = get_cdf(simh, xbins)

plt.plot(cdf_obs,label="$F_{obs}$")
plt.plot(cdf_simh, label="$F_{simh}$")
# we have to scale the CDF of simh
plt.plot(np.interp(cdf_simh, (cdf_simh.min(), cdf_simh.max()), (cdf_obs.min(), cdf_obs.max())), label="$F^{*}_{simh}$")
plt.legend()

You will see that the unscaled CDF doesn't goes up like those of obs:

… so the fix is to scale simh to obs. I will create an issue for that as well and will commit my changes for QM and QDM.

Thank you so much!

Answer selected by Pan-Yuxian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with obs and simh if they have different time lengths? #65

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

How to deal with obs and simh if they have different time lengths? #65

Pan-Yuxian Mar 8, 2024

Replies: 1 comment · 2 replies

btschwertfeger Mar 8, 2024 Maintainer

Pan-Yuxian Mar 8, 2024 Author

btschwertfeger Mar 8, 2024 Maintainer

Pan-Yuxian
Mar 8, 2024

Replies: 1 comment 2 replies

btschwertfeger
Mar 8, 2024
Maintainer

Pan-Yuxian Mar 8, 2024
Author

btschwertfeger Mar 8, 2024
Maintainer