Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locked iterations APIs for dicts and lists #4571

Open
ngoldbaum opened this issue Sep 20, 2024 · 4 comments
Open

Add locked iterations APIs for dicts and lists #4571

ngoldbaum opened this issue Sep 20, 2024 · 4 comments

Comments

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Sep 20, 2024

See #4439 and #4539 for context.

pyo3 follows Python's behavior for multithreaded dict and list iteration and allows race conditions (see my experiment here: #4539 (comment)).

Unfortunately as a consequence in order to preserve this behavior on the free-threaded build, we will need to use slower owned reference APIs for lists (#4539) and apply a critical sections for dicts in each loop iteration (#4439) since there are no equivalent locking owned reference iteration APIs for dicts.

It would be nice in both cases if we could simply lock the dict or list while we are iterating over it, which would allow us to use the faster APIs that access list and dict internals. However this would make the semantics for iteration via pyo3 different than via python.

Instead, I think we should add a new locked_iter function to PyDictMethods and PyListMethods. Maybe PyAnyMethods too but users can add their own locking if they want too for the generic case, we need to do it for dict and list for in PyO3 for performance reasons.

Instead of directly returning an iterator, instead users would pass in a closure that accepts an iterator and work with the iterator inside the closure. My understanding from @davidhewitt is that using a closure would ensure exactly one Py_END_CRITICAL_SECTION follows each Py_BEGIN_CRITICAL_SECTION, even if a panic happens and even if the critical sections are recursive.

I'm not terribly experienced with writing rust APIs that take closures, but I think the API I'm looking for is this?

fn locked_iter<F>(&self, closure: F) -> PyResult<()>
where
    F: Fn(PyDictLockedIterator<'py>) -> PyResult<()>

There PyDictLockedIterator would be like PyDictIterator, but its use would implicitly imply a critical section is held and the dict is locked. And of course a similar API for lists.

This is something we could add for PyO3 0.24, and for 0.23 we'd merge the two open PRs related to this and only have the slow, safe iteration for the free-threaded build.

Ping @bschoenmaeckers

@bschoenmaeckers
Copy link
Contributor

Thanks for moving this forward. I will pick this up next week so we can merge that MR.

@davidhewitt
Copy link
Member

I think the closure needs to be similar to for_each which takes each element in turn, so that that way we get to call PyDict_Next within our own code and can be sure that users can't nest critical sections.

e.g. something like this:

fn locked_for_each<F>(&self, closure: F) -> PyResult<()>
where
    F: Fn(Bound<'py, PyAny>, Bound<'py, PyAny>) -> PyResult<()>

... that said, now that I say this, I realise that what we should probably do is override Iterator::fold and Iterator::try_fold to lock the critical section inside of them, which would automatically optimize many uses of PyO3 iterators (e.g. .iter().map(...).sum()).

@ngoldbaum
Copy link
Contributor Author

Hmm, can we actually use try_fold? I tried to write an implementation for PyList and I think in order to actually implement it, you need to use the Try trait, which isn't stable yet.

@davidhewitt
Copy link
Member

Doh, of course not. We can still implement fold, and maybe on our nightly feature we could implement try_fold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants