-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: expression plugins #26
Conversation
polars-core = { workspace = true, default-features = false } | ||
polars-ffi = { workspace = true, optional = true } | ||
polars-plan = { workspace = true, optional = true } | ||
polars-lazy = { workspace = true, optional = true } | ||
pyo3 = "0.19.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all external deps be pinned to micro version here? E.g. any chance of pyo3
being pinned to 0.19
only?
(exact pinning probably makes sense for polars libraries themselves though...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it in cargo. 0.19.0
is equal to 0.19.0..0.19.n
, though this is not super clear. To exactly pin it, we should type =0.19.0
.
Pretty exciting!
Question 1: (apologies in advance if the questions are stupid) it's just not immediately clear from the diff... how would you pass extra arguments to those functions? E.g. you have a compiled expression (You'd think many expressions, including prebuilt ones, often have various non-series parameters.) Question 2: is it the plan to only allow this on series level, or frame level as well? (since some computations you can do much faster internally, handling parallelization yourself if it's problem-specific, as opposed to having polars runtime sort it out). |
Indeed. Currently it only works for multiple expression arguments. Though many types can be represented as single element series. I'd welcome the ability to make non-series arguments easier, but I would still have to think a little bit about that.
Only on series level. You could ofcourse always accept a series of type Ideally, I think we would have arguments that allow you to influence the paralllism strategy. |
Just a random thought, one way would be to come up with a restricted set of allowed argument types safe to send across ffi and thread boundaries, like a json::Value-style enum of all valid scalars (iirc polars already has something similar) plus lists, dicts, nullable stuff etc, and then provide conversions on both pyo3 and rust sides. Along the lines of // args: Arc<[Arg]>
enum Arg {
Null,
String(String),
List(Arc<[Arg]>),
...
} It could be implemented completely differently, of course. |
This allows support for polars plugins. These are expression exposed in a different shared library and dynamically linked into the polars main library.
This mean we or third parties can create their own expressions and they will run on our engine without python interference. So no blockage by the GIL.
We can therefore keep polars more lean and maybe add support for a
polars-distance
,polars-geo
,polars-ml
, etc. Those can then have specialized expressions and don't have to worry as much for code bloat as they can be optionally installed.The idea is that you define an expression in another Rust crate with a proc_macro
polars_expr
.That macro can have the following attributes:
output_type
-> to define the output type of that expressiontype_func
-> to define a function that computes the output type based on input types.Here is an example of a
String
conversion expression that converts any string to pig latin:On the python side this expression can then be registered under a namespace:
Compile/ship and then it is ready to use:
See the full example here: https://github.com/pola-rs/pyo3-polars/tree/plugin/example/derive_expression