Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars built in pyodide ? #3672

Closed
oscar6echo opened this issue Jun 11, 2022 · 10 comments
Closed

Polars built in pyodide ? #3672

oscar6echo opened this issue Jun 11, 2022 · 10 comments
Labels
build Changes that affect the build system or external dependencies python Related to Python Polars

Comments

@oscar6echo
Copy link

Could polars be compiled in pyodide, like all these packages, and thus be easily available in jupyterlite ?

The use case:
I am a heavy pandas user, will soon try and replace it with polars for new projects. I am also increasingly using jupyterlite to allow users to run "custom analytics" autonomously - with zero install. Naturally in a browser, datasets are unlikely to be really big (where polars shine), but if they are made to run both in the browser and locally (which I intend to do), it would be convenient.

From a distance, it would seem feasible as rust and wasm are not stranger to each other, correct ?

I am floating the idea to see what other people and the polars contributors think.
To polars creator/contributors: By the way you then demo polars live.

Last, but not least, congratulations for this impressive lib, and the neat docs ! 👏

@ghuls
Copy link
Collaborator

ghuls commented Jul 7, 2022

Pyodide doesn't support threads AFAIK, which are currently needed by polars.

@oscar6echo
Copy link
Author

Ok thank you for the clarification.

Closing as answered.

@kylebarron
Copy link
Contributor

I did some research on this tonight and it seems plausible to get a working pyodide build, even without a separate polars wasm build.

The first thing to note is that pyodide builds rust code against the wasm32-unknown-emscripten target not wasm32-unknown-unknown target, which is where (I think) most of the existing wasm-related polars work has gone.

First, I was able to get a wasm build of py-geopolars working in pyodide, though this links only against the Series export. This wasm loaded and was parsed successfully in pyodide, and then eventually failed when it tried to import and subclass from polars.Series on the Python side.

Then, turning to py-polars itself, I made some progress. I had to remove the parquet, fmt, decompress-fast, and extract_jsonpath features from polars, due to various dependency errors that I didn't want to spend time debugging at the moment.

After compiling for an hour, using the steps below, I hit only a couple errors:

  1. a couple now-erroneous imports from features I removed, which should be easy to handle

  2. Hundreds of error: #[ctor] is not supported on the current target errors, like this

    2310   │ error: #[ctor] is not supported on the current target
    2311   │   --> src/lazy/dataframe.rs:95:1
    2312   │    |
    2313   │ 95 | #[pymethods]
    2314   │    | ^^^^^^^^^^^^
    2315   │    |
    2316   │    = note: this error originates in the attribute macro `$crate::ctor` (in Nightly builds, run with -Z macro-backtrac
           │ e for more info)
    

    which I don't understand. Given this, it seems like if the #[ctor] issue were solved, a polars pyodide build could be achieved by just adding a new feature flag to the py-polars crate.

    It seems like this might be an issue with python classes. I'll ask on the pyo3 issue tracker.

Repro for build attempt

My build steps (derived from here). My branch is here.

Install nightly

rustup toolchain install nightly
rustup target add --toolchain nightly wasm32-unknown-emscripten

Install maturin 0.13 to take advantage of its new support for emscripten. Note that maturin even tests against emscripten on CI now.

pip install -U maturin

Set up emscripten 3.1.14. As of pyodide 0.21.0a3, pyodide is compiled against emscripten 3.1.14 and any extension module must also be compiled against the same version.

git clone https://github.com/emscripten-core/emsdk.git # tested and working at commit hash 961e66c
cd emsdk
./emsdk install 3.1.14
./emsdk activate 3.1.14
source ./emsdk_env.sh

Compile polars. Note this requires python 3.10; I used 3.10.5.

# cd into python dir
cd ../py-polars
RUSTUP_TOOLCHAIN=nightly maturin build --release -o dist --target wasm32-unknown-emscripten -i python3.10

Threads

Pyodide doesn't support threads AFAIK, which are currently needed by polars.

In terms of threading, it looks like a previous rust package was able to fix this just by turning changing the number of threads. Seems like it would be straightforward to do the same by setting RAYON_NUM_THREADS=1.

References/Further reading

@kylebarron
Copy link
Contributor

kylebarron commented Jul 19, 2022

From PyO3/pyo3#2517 (comment), looks like a pyodide build would be possible if the polars python binding switched to using a single #[pymethods] impl.

I need to read up on rust macros; if the macros could be used inside a single

#[pymethods]
impl PySeries {

instead of being called at the top level, seems like it might be possible to switch to this?

@jonashaag
Copy link
Contributor

This is still blocking a WASM build. I'll have a look at merging those blocks.

@jonashaag
Copy link
Contributor

With merged blocks, got this pyodide/pyodide#4016

@bitsondatadev
Copy link

Didn't see this linked anywhere and feel it's also relevant as new rollouts for thread support in WA/Pyosdide could unblock something.

pyodide/pyodide#237

@lorentzenchr
Copy link
Contributor

@stinodego @alexander-beedie @ghuls @MarcoGorelli Would you consider to reopen this issue or is it indeed out of scope?

@stinodego stinodego added python Related to Python Polars build Changes that affect the build system or external dependencies labels Jul 2, 2024
@astrojuanlu
Copy link

xref pyodide/pyodide#4611

@georgestagg
Copy link

georgestagg commented Oct 3, 2024

I'm taking a shot at this. I have an LLVM fork that can work with multiple uses of #[pymethods]. More info at dtolnay/inventory#71 (comment) and llvm/llvm-project#111008 (comment).

Using the LLVM fork with a custom rustc, disabling a bunch of features in Polars, and making some patches for wasm32 & Emscripten - (my patches are at georgestagg/polars@emscipten) - I can build a wheel for Pyodide:

371104753-f79226d6-c58b-47d7-af16-27f00b380488

The path to making this work properly is still a long one, but looks as follows:

  1. Upstream LLVM patch to fix multiple .init_array symbols in Wasm objects.
  2. Wait for a version of rustc to be released with the LLVM fixes included.
  3. Update the inventory package to add .init_array symbols under WebAssembly.
  4. Upstream patches to Polars to support the wasm32-unknown-emscripten target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Changes that affect the build system or external dependencies python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

9 participants