How to use Vaex on-the-fly data with Researchpy or Pingouin or any Statisctic-related Python module #2251
Replies: 1 comment
-
Hi, The formal answer is "it depends", but in practice it is more like "does not look good". Let me explain. When using vaex, all the operations that vaex can do on columns (mean, stdev, skew, value_counts, etc..) we have implemented ourselves in C++ using streaming algorithms so you can use apply them on data of arbitrary size. (I say "we", but it is all work of @maartenbreddels ). So when talking about applying functions / methods etc.. from other packages to vaex data, they fall broadly in 2 categories:
For 1): For 2):
Looking at the source of There is also no need to export data from vaex to pandas is chunks in order to do some operations on them (on those chunk), as that will indeed include some overhead. Better is to use Having said all of this, if you do know how to make more things out-of-core, PRs are always welcome! :) Finally, as vaex is fully compatible and support Apache Arrow, if there is something in that project that would be useful but we do not support it yet, let us know and we will look into it. I hope this answers the question. |
Beta Was this translation helpful? Give feedback.
-
Hi, all of you,
So I have several questions that can be resumed in one sentence: How can I apply functions from other packages like Researchpy or Pingouin to on-the-fly data created by Vaex? Is it possible to manipulate data that is not "written" in the memory? I guess it is possible since Vaex is doing it (e.g. data.mean(name_data_column)).
I tried to convert on-the-fly Vaex data to panda DataFrame, but it took a lot of time.
A practical example I want to do is the following:
I want to calculate the normality of one on-the-fly column using the module scipy.stats
One option is to integrate the Stat Vaex module with the functions I want to use. Do you know which files contain the stat function from Vaex?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions