You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After upgrading to Xarray 2024.11.0, a colleague from work reported to me that the forward fill method was not properly working when the limit parameter was used, causing some values to get NaN instead of the previous value.
I investigated what was causing the issue, and I found that the problem is in the method used as binop on the cumreduction for calculating the valid positions to push, it is not correctly restarting the counter because the cumreduction calls twice this function, one for adding the last value of the previous chunk to the actual one, and the second time for combining the last value of two chunks, this last call it's not working as expected because it is not able to detect if there was a restart or not of the accumulation, making that the number of nans accumulates indefinitely.
I have not found a way to correctly detect if there was a restart on the counter or not when the cumreduction calls the binop again, mainly because the function does not receive any parameter indicating the shape of the chunk, so it is hard to determine if the counter suffered a restart or not just seeing it's last accumulated value, for that reason, I propose to replace the direct use of the cumreduction (only for the valid positions) with some Dask function as shown on the attached images, unfortunately, this generates more tasks than before.
Before:
After:
What did you expect to happen?
I expect that the results of the push method will match the in-memory version as shown in the minimal complete verifiable example.
What happened?
After upgrading to Xarray 2024.11.0, a colleague from work reported to me that the forward fill method was not properly working when the limit parameter was used, causing some values to get NaN instead of the previous value.
I investigated what was causing the issue, and I found that the problem is in the method used as binop on the cumreduction for calculating the valid positions to push, it is not correctly restarting the counter because the cumreduction calls twice this function, one for adding the last value of the previous chunk to the actual one, and the second time for combining the last value of two chunks, this last call it's not working as expected because it is not able to detect if there was a restart or not of the accumulation, making that the number of nans accumulates indefinitely.
I have not found a way to correctly detect if there was a restart on the counter or not when the cumreduction calls the binop again, mainly because the function does not receive any parameter indicating the shape of the chunk, so it is hard to determine if the counter suffered a restart or not just seeing it's last accumulated value, for that reason, I propose to replace the direct use of the cumreduction (only for the valid positions) with some Dask function as shown on the attached images, unfortunately, this generates more tasks than before.
Before:
After:
What did you expect to happen?
I expect that the results of the push method will match the in-memory version as shown in the minimal complete verifiable example.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: 5279bd1
python: 3.11.10 | packaged by conda-forge | (main, Sep 10 2024, 10:53:25) [MSC v.1940 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('Spanish_Venezuela', '1252')
libhdf5: 1.14.3
libnetcdf: 4.9.2
xarray: 2025.1.2.dev1+g5279bd15.d20250111
pandas: 2.2.2
numpy: 2.0.2
scipy: 1.14.1
netCDF4: 1.7.1.post2
pydap: None
h5netcdf: None
h5py: None
zarr: 2.18.3
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.9.0
distributed: 2024.9.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: 0.8.2
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 73.0.1
pip: 24.2
conda: None
pytest: 8.3.3
mypy: 1.11.2
IPython: 8.27.0
sphinx: None
The text was updated successfully, but these errors were encountered: