You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working with geolocation time series climate data. Data exploration has already been much easier thanks to vaex. Now I would like to interpolate NaN values for each grid cell representing a time series. The dataframe is ~210 million rows by 5 columns and the interpolation needs to happen per individual "geo_idx" which is a series of about 400 time steps. Currently, I am attempting to iterate over the dataframe to filter per "geo_idx" and do the interpolation in pandas and save the result to vaex hdf5 format, i.e.:
However, this is prohibitively slow to call to_pandas_df() and from_pandas() again so I was wondering if there are any suggestions about how to do this interpolation. Thanks in advance!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I am working with geolocation time series climate data. Data exploration has already been much easier thanks to vaex. Now I would like to interpolate NaN values for each grid cell representing a time series. The dataframe is ~210 million rows by 5 columns and the interpolation needs to happen per individual "geo_idx" which is a series of about 400 time steps. Currently, I am attempting to iterate over the dataframe to filter per "geo_idx" and do the interpolation in pandas and save the result to vaex hdf5 format, i.e.:
However, this is prohibitively slow to call
to_pandas_df()
andfrom_pandas()
again so I was wondering if there are any suggestions about how to do this interpolation. Thanks in advance!Beta Was this translation helpful? Give feedback.
All reactions