Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[polars_expr(type_func=...)] always fails #30

Closed
andysham opened this issue Oct 4, 2023 · 0 comments · Fixed by #31
Closed

#[polars_expr(type_func=...)] always fails #30

andysham opened this issue Oct 4, 2023 · 0 comments · Fixed by #31

Comments

@andysham
Copy link
Contributor

andysham commented Oct 4, 2023

I haven't been able to get this feature to work locally, based on the example - note that the haversine expression isn't used in the test run.py file.

When I attempt to use the haversine expression, I get the following error

Traceback (most recent call last):
  File "/home/andy/git/pyo3-polars/example/derive_expression/run.py", line 13, in <module>
    out = df.with_columns(
  File "/home/andy/git/pyo3-polars/example/derive_expression/venv/lib/python3.10/site-packages/polars/dataframe/frame.py", line 7903, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(eager=True)
  File "/home/andy/git/pyo3-polars/example/derive_expression/venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 3809, in with_columns
    return self._from_pyldf(self._ldf.with_columns(pyexprs))
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: DlSym { desc: "/home/andy/git/pyo3-polars/example/derive_expression/expression_lib/expression_lib/expression_lib.cpython-310-x86_64-linux-gnu.so: undefined symbol: __polars_field_haversine" }
make: *** [Makefile:22: run] Error 1

Minimal reproduction

This modification to ./example/derive_expression/run.py returns the above error, everything else unchanged.

import polars as pl
from expression_lib import Language, Distance

df = pl.DataFrame({
    "names": ["Richard", "Alice", "Bob"],
    "moons": ["full", "half", "red"],
    "dist_a": [[12, 32, 1], [], [1, -2]],
    "dist_b": [[-12, 1], [43], [876, -45, 9]],
    "floats": [5.6, -1245.8, 242.224]
})


out = df.with_columns(
   pig_latin = pl.col("names").language.pig_latinnify(),
).with_columns(
    hamming_dist = pl.col("names").dist.hamming_distance("pig_latin"),
    jaccard_sim = pl.col("dist_a").dist.jaccard_similarity("dist_b"),
    haversine = pl.col("floats").dist.haversine("floats", "floats", "floats", "floats"),
)

print(out)

Potential fix

It seems that whenever the polars-plan crate resolves the type information of an FFI call, it looks for the symbol __polars_field_{}, where {} is the name of the Rust function under #[polars_expr(...)], see this LoC.

The function that is actually exposed for this field resolution is defined here for expressions of the pattern #[polars_expr(type_func=...)], and here for expressions of the pattern #[polars_expr(output_type=...)].

In the latter case, as fn_name is the function name itself (i.e. haversine in this case), get_field_name returns the correct symbol - in the former, fn_name is instead the name of the type resolution function haversine_output, meaning the symbol exposed in the FFI is instead __polars_field_haversine_output, which is unknown to polars-plan.

We can verify this by inspecting the .so file created by pyo3:

$ nm -D ./expression_lib/expression_lib/expression_lib.cpython-310-x86_64-linux-gnu.so | grep " T"
00000000001d0eb0 T __polars_field_hamming_distance
00000000001d14f0 T __polars_field_haversine_output
00000000001d0890 T __polars_field_jaccard_similarity
00000000001d0340 T __polars_field_pig_latinnify
00000000001d1110 T hamming_distance
00000000001d1710 T haversine
00000000001d0af0 T jaccard_similarity
00000000001d05a0 T pig_latinnify

If the fix is simply changing the symbol here, I have implemented this in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant