Replies: 2 comments 8 replies
-
Not that it's not worth trying to speed this up, but wouldn't you be covered by a |
Beta Was this translation helpful? Give feedback.
-
I think with such fine-grained operations, you are at the limits of what can currently be achieved with PyO3, i.e. the work we do to provide type-safe access to generic Rust code does imply a certain overhead at the Python-Rust boundary which requires a certain minimum amount of work on the Rust side to still result in speed-ups. That said, we are continuously working to decrease that overhead. For example, we will remove the But then gain, accesses to thread-local storage figure prominently in your profiles which I think is actually one of our coping mechanisms: We keep a private flag in thread-local storage to indicate that the GIL is held so that we do not have to call into the Python interpreter to verify this for nested calls to But for example in your As for Finally, you should be able to avoid some GIL wrangling overhead using unsafe code, e.g. provide both #[repr(transparent)]
struct MyPyObject(PyObject); and #[repr(transparent)]
struct MyBoundAny<'py>(Bound<'py, PyAny>); and in your This should work because However, this will not reduce the constant overhead of the outermost setup we do to make |
Beta Was this translation helpful? Give feedback.
-
Hi all,
First off, thank you for making and maintaining PyO3! It's been a joy to use in general, despite being a total Rust newbie.
For ... reasons, I have a need for a set type that remembers its insertion order, just like Python's built-in
dict
. I came across https://github.com/indexmap-rs/indexmap and thought, well, let's just wrap that into Python, how hard could it be? The results of my efforts are here: https://github.com/inducer/indexset/.Unfortunately, that wrapper gets demolished a bit in terms of performance by Python's
set
. This is what I get (withmaturin develop -r
) for a simple benchmark:Here's that benchmark script:
https://github.com/inducer/indexset/blob/0c765a23a2dbe679156560c63a364431ef2a3019/examples/benchmark.py
Benchmarking
create
For the
create
operation, this is whereperf
says the time is being spent:It looks like GIL wrangling is a major culprit here. At the same time, the object creation codepath looks pretty textbook:
https://github.com/inducer/indexset/blob/0c765a23a2dbe679156560c63a364431ef2a3019/src/lib.rs#L55-L57
I don't know what I could/should be doing differently.
Benchmarking
add
For the
add
operation, this is whereperf
says the time is being spent:So the underlying data type appears to be responsible for some of the damage here, but the equality comparison also features prominently. There's some GIL wrangling in there, too.
I had a hunch that storing the hash along with the object might be profitable (by avoiding some of the GIL acquisitions here, but that did not pan out, it made things even slower. See this PR.)
Discussion
The wrapper code is not obnoxiously long and pretty simple:
https://github.com/inducer/indexset/blob/0c765a23a2dbe679156560c63a364431ef2a3019/src/lib.rs
At one point, I guessed that memory allocation would feature prominently, but it doesn't even show up in the profile.
I'm a bit out of ideas what I could try to improve matters, and I'd be grateful for any advice.
Edit: I just realized I should have specified PyO3 versions and other environment specifics. Most of that is in the lock file. Otherwise, I'm using Python 3.12 on a Raptor Lake laptop on Debian testing/unstable with
cc @matthiasdiener
Beta Was this translation helpful? Give feedback.
All reactions