Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PyMutex public in the non-limited API #22

Closed
colesbury opened this issue Apr 10, 2024 · 67 comments
Closed

Make PyMutex public in the non-limited API #22

colesbury opened this issue Apr 10, 2024 · 67 comments

Comments

@colesbury
Copy link

Links

CPython Issue: python/cpython#117511
PR draft: python/cpython#117731
Additional discussion: https://discuss.python.org/t/make-some-pymutex-apis-public/50203

Overview

The PyMutex APIs are currently internal-only in pycore_lock.h. This proposes making the type and two functions public as part of the general, non-limited API in Include/cpython/lock.h.

The APIs to be included in the public header are:

typedef struct PyMutex { ... } PyMutex;
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }

Additionally, the out-of line of "slow path" functions are exposed in the header, but not part of the public API. (For context, these are called when PyMutex_Lock needs to block and when PyMutex_Unlock needs to wake up another waiting thread).

// (private) slow path for locking the mutex
PyAPI_FUNC(void) _PyMutex_LockSlow(PyMutex *m);

// (private) slow path for unlocking the mutex
PyAPI_FUNC(void) _PyMutex_UnlockSlow(PyMutex *m);
@encukou
Copy link
Collaborator

encukou commented Apr 11, 2024

+1 from me.


This does introduce a non-opaque struct, which is a red flag worth focusing on. The practical issue with such a struct is that it's part of the ABI, so

  • we can't change it in a bugfix/security release
  • if this later makes it to the stable ABI (a possibility I don't want to exclude), then we can't ever change it: the whole API would need to be deprecated and replaced with a new one. But with such minimal API, even that would be bearable.

There's some mind-blowingly genius engineering behind the "one-byte struct and two functions" API surface. But the idea is proven; it's been used in WebKit since 2015. I don't think it'll need to change any time soon.


The struct member name should not appear in user code, so please mark it as private using a leading underscore:

typedef struct {
    uint8_t _v;
} PyMutex;

Note that this requires atomics -- the C11 optional feature, or the MSVC flavour. (The GCC flavour is an optional speedup.)
We need to (belatedly) update PEP-7 to allow ourselves to use atomics.

@gvanrossum
Copy link

Yeah, makes sense.

@vstinner
Copy link

The static inline function calls the private _Py_atomic_compare_exchange_uint8() function. I would prefer to clarify _Py_atomic_compare_exchange_uint8() visibility first: I would like it to be public.

I would prefer to make the whole big Py_atomic C API public in Python 3.14, and then add PyMutex. IMO it's really hard to write efficient C code for the Free Threading build (PEP 703) without having this C API available. And I would prefer that people don't start to consume the private API, otherwise, once again, "private" will mean "public" :-(

Before Py_atomic C API is public, maybe we can start by having an opaque PyMutex implementation in the stable ABI:

  • function to allocate/release a PyMutex on the heap memory: I don't want to expose the structure. Every time I added a structure to the C API, I regret :-(
  • opaque function working on a PyMutex*: Lock/Unlock.

Later, once the static inline will be added, if Py_LIMITED_API macro is not defined, you would use the same API, but use the faster static inline functions.

@encukou
Copy link
Collaborator

encukou commented Apr 19, 2024

Current C compilers support C11/C++ atomics; other languages have their own locking. I don't think we want to add the wrappers as any kind long-term-support API; definitely not as dozens of type-specific functions.
Third-party code should simply #include <atomic> and use atomic_compare_exchange_strong((_Atomic(uint8_t)*)obj, expected, desired).
If their compiler doesn't support that, they should update and/or use experimental:c11atomics for MSVC.
If that turns out to be problematic, let's spin the wrappers off into their own library. (Steve had that idea for PyTime_*Raw, but here it would make much more sense to me: it'd be much more that 3 functions, and it'd be expected to eventually go away as compiler C11 support improves.)

function to allocate/release a PyMutex on the heap memory

The point here is to avoid memory allocation overhead (both time and space) for the one byte.

If it helps: I see PyMutex as a name for uint8_t -- a stronger typedef, so to say. Effectively, this proposal is adding:

static inline void PyMutex_Lock(uint8_t *v) { ... }
static inline void PyMutex_Unlock(uint8_t *v) { ... }

except we want to prevent users from doing arithmetic on v.

@vstinner
Copy link

Current C compilers support C11/C++ atomics; other languages have their own locking. I don't think we want to add the wrappers as any kind long-term-support API; definitely not as dozens of type-specific functions.

How does someone use PyMutex in C++, Rust or any other language if _Py_atomic_compare_exchange_uint8() is not available for them? Knowning that it's implemented as a static inline function.

Third-party code should simply #include and use atomic_compare_exchange_strong((_Atomic(uint8_t)*)obj, expected, desired).

I don't think that atomic_compare_exchange_strong((_Atomic(uint8_t)*)obj, expected, desired) is code which is easy to use for C beginners.

I'm not sure that I understand. Do you suggest that people don't use the PyMutex API if _Py_atomic_compare_exchange_uint8() is not available for them? So PyMutex would only be usable in C with a C compiler supporting _Py_atomic_compare_exchange_uint8()?

If that turns out to be problematic, let's spin the wrappers off into their own library.

This is something new. I would prefer to not go this way. It looks very complicated.

@davidhewitt
Copy link

How does someone use PyMutex in [...] Rust

Unless there's a case where Rust code needs to lock / unlock a PyMutex defined in other objects, I don't think Rust code needs to use it. We've got mutexes and atomics in the Rust standard library, and there's also the portable_atomic crate for platforms not supported by the standard library. So I think Rust code will almost always have atomics available, and will use Rust mutex types most (all?) of the time.

@colesbury
Copy link
Author

colesbury commented Apr 19, 2024

For C++, you just call the same code as C: PyMutex_Lock(&m). It works in both C and C++.

I think the Rust situation is pretty different. In addition to what @davidhewitt wrote, mutexes in Rust protect specific data, so a mutex with no associated data is an awkward fit. If a mutex that releases the GIL when blocking makes sense, then I think PyO3 could implement it without too much difficulty.

@vstinner
Copy link

@davidhewitt:

Unless there's a case where Rust code needs to lock / unlock a PyMutex defined in other objects, I don't think Rust code needs to use it.

If Rust access an object which has a lock implemented as PyMutex, it should use the PyMutex_Lock() API while accessing the object, no?

For example, PyWeakReference has a struct _PyMutex *weakrefs_lock; member and the implementation uses PyMutex_LockFlags(wr->weakrefs_lock, _Py_LOCK_DONT_DETACH) and PyMutex_Unlock(wr->weakrefs_lock) to access the object. I'm not sure if it's the best example, since lock/unlock is currently hidden in public C API to access weak references.

@davidhewitt
Copy link

Is that weakref access implemented entirely in static inline functions? If so, then yes I think Rust code will need to reproduce the implementation (probably using Rust atomics).

@encukou
Copy link
Collaborator

encukou commented Apr 23, 2024

We can additionally provide PyMutex_Lock as an actual exported function; non-C languages get some function call overhead. (We do that a lot in the limited API, which is more friendly to non-C languages at the expense of performance, and it's working out rather well. We can do it in the non-limited API as well. See a recipe here.)

If we do expect people to re-implement these in other languages, rather than call them, then their body needs to be documented and stable. That's a can of worms, but, if people will do it anyway we might want to set reasonable rules and stability expectations.


I'm not sure that I understand. Do you suggest that people don't use the PyMutex API if _Py_atomic_compare_exchange_uint8() is not available for them? So PyMutex would only be usable in C with a C compiler supporting _Py_atomic_compare_exchange_uint8()?

No, there I wasn't talking about the PyMutex API, but about your proposal to make _Py_atomic_compare_exchange_uint8 public. Since the function is static inline, it is only usable in C, even if we make it public.
(Unless you want to export _Py_atomic_compare_exchange_uint8 as a regular function too? I'm also not sure that I understand your proposal.)

@colesbury
Copy link
Author

I would really like for a decision to be made on this proposal

@gvanrossum
Copy link

I'm okay with whatever @encoku says.

@encukou
Copy link
Collaborator

encukou commented Apr 29, 2024

And the others?

@zooba
Copy link

zooba commented Apr 29, 2024

Main change I'd like to see is replacing the explicit initialization ((PyMutex){0}) with a function call - probably a PyMutex_Init(&mutex)1 and PyMutex_Destroy(&mutex) - just to make sure we have an API that can be updated later without affecting anyone already using it. Destroying a mutex has also proven to be valuable as a debugging aid to make anything currently waiting fail.

And unless there's an important reason otherwise, specify the public PyMutex as being pointer sized rather than a single byte. (I can imagine wanting to fill a compact array with a lot of mutexes, but that doesn't actually seem likely?) That allows us some freedom for future backwards-compatible changes that may involve more state. Having an internal interface that is only one byte is fine - it's just the public definition I'm concerned about, and as this is statically allocated (potentially cross-language), knowing that we've always got a void * available without callers recompiling is a nice thing to have when change happens.

Footnotes

  1. Rather than an assignment, since we can't really be assured a copy will always be possible.

@colesbury
Copy link
Author

I am not in favor of any of these suggestions. The pointer size means a completely different type and different API. In other words, a different proposal.

Regarding PyMutex_Init, you need a way to initialize static mutexes. The equivalent of PTHREAD_MUTEX_INITIALIZER. So you're still limited to constant initialization. Why would you want to switch to any constant initializer other than zero?

Practically, most mutexes are either global (i.e., statically initialized to zero) or instances of some struct. For subtypes of PyObject, the typical initialization in extensions already zero initialize fields so any PyMutex field does not need any explicit initialization.

We should not make the API more complicated just for some unlikely hypothetical changes. Note that PyThreadType_lock already supports these hypotheticals, but it's not at all suitable for these use cases.

@zooba
Copy link

zooba commented Apr 29, 2024

Okay, I'll concede zero initialization (and no destruction), but I still worry about having a public, non-opaque API based around a single byte. The API doesn't change in that case - we only have to use a single byte in the struct for what we currently want - in fact, I'd settle for the struct remaining at one byte but the specification saying to ensure you have sizeof(void *) available at the address of your PyMutex.1 Unless you happen to be carefully packing things, there's a good chance of ending up with that much padding anyway.

Can you expand a little on why PyThreadType_lock is not at all suitable? Is that due to being a larger size than 1? Or some other reason?

Footnotes

  1. I'm thinking of non-C callers here, who aren't using our PyMutex definition directly when they allocate it.

@vstinner
Copy link

Regarding PyMutex_Init, you need a way to initialize static mutexes. The equivalent of PTHREAD_MUTEX_INITIALIZER.

So there is a API for that? Can you add an API for static init?

@colesbury
Copy link
Author

Can you expand a little on why PyThreadType_lock is not at all suitable?

To be clear, I mean that it would not be suitable for some of the current uses of PyMutex in CPython. We have a PyMutex per-PyObject. The API of PyThreadType_lock requires memory allocation (it returns a pointer). I think an extra memory allocation and deallocation for every PyObject created would be prohibitively expensive. The 1 byte vs. pointer size is important, but not as important as avoiding the allocation.

I'm thinking of non-C callers here, who aren't using our PyMutex definition directly when they allocate it

I'm not opposed to providing extra guidance for something like Rust bindings, but I don't know what concretely we can say or what actions we expect them to take. PyO3 could make their PyMutex definition pointer sized, but I don't see how that would help us.

I wrote this in the other issue, but the primary ways of extending PyMutex is in the private wait entry API. We can add (or remove) as many fields as we want to that in the future without backwards compatibility concerns. Internally, there's a hash table that maintains a mapping of PyMutex address to struct mutex_entry* for mutexes with threads waiting on them.

Unless you happen to be carefully packing things, there's a good chance of ending up with that much padding anyway.

Yes, that's true -- the bigger problem is that having more than one size or definition of PyMutex is problematic. (And in a few cases the 1 byte size matters.)

@colesbury
Copy link
Author

So there is a API for that? Can you add an API for static init?

No, I was not suggesting an API for static init. I was trying to point out that if you want an API for initialization you need at least two functions/macros.

@zooba
Copy link

zooba commented Apr 29, 2024

Internally, there's a hash table that maintains a mapping of PyMutex address to struct mutex_entry* for mutexes with threads waiting on them.

This is an important point to highlight in the API - the address has to be stable (in effect, the mutex is the address of the mutex, not merely the value).

the bigger problem is that having more than one size or definition of PyMutex is problematic. (And in a few cases the 1 byte size matters.)

What would be wrong with a definition like this:

struct PyMutex {
    union {
        uint8_t _v;
        void *_reserved;
    };
};

We still get our single byte, it's going to be as stably in the same position as anything else we could define, and the ABI is now immune to us adding up to three additional bytes for every supported platform, or changing the value of the mutex to a pointer (e.g. pointing directly into our own table in order to make the mutex value-based rather than address-based).

@colesbury
Copy link
Author

colesbury commented Apr 29, 2024

This is an important point to highlight in the API - the address has to be stable...

This is the same as other locking APIs, the type is just smaller: the functions take a PyMutex* not a PyMutex. Similarly, pthreads has pthread_mutex_t (a struct) and pthread_mutex_lock takes pthread_mutex_t*. You can't copy a pthread_mutex_t or a Windows CRITICAL_SECTION and expect things to work.

The pthread_mutex implementation in glibc has the same behavior because internally there's some int whose address is passed to the Linux futex system calls.

What would be wrong with a definition like this?

It's incompatible with the one-byte internal definition and doesn't adhere to the "one definition rule." I'd expect things to probably work out okay, but it doesn't seem to me like a good enough reason deviate from standard compliant C.

I don't think we will want a pointer/table based mutex, but if, hypothetically we wanted that, we should just add a new API instead of trying to repurpose existing APIs.

EDIT: The "one definition rule" might not be the relevant rule, but trying to use the same functions on differently defined types seems fishy even if it works in practice.

@zooba
Copy link

zooba commented Apr 29, 2024

This is the same as other locking APIs ...

The point is that it's different from most of our APIs, so it should be highlighted (or avoided, but I assume making it an explicit part of the API is preferable).

I don't think we will want a pointer/table based mutex, but if, hypothetically we wanted that, we should just add a new API instead of trying to repurpose existing APIs.

Sure, whatever.

I'm getting sick of arguing in favour of API stability on every issue. If nobody else from the WG wants to work for it at the early stages, then consider me abstained from this vote.

@vstinner
Copy link

Practically, most mutexes are either global (i.e., statically initialized to zero) or instances of some struct. For subtypes of PyObject, the typical initialization in extensions already zero initialize fields so any PyMutex field does not need any explicit initialization.

_PyObject_GC_New() does not initialize allocated memory to zeros. Many types use _PyObject_GC_New().

I would prefer to have a macro and a function to initialize a mutex. Something like:

#define PyMutex_STATIC_INIT (PyMutex){0}

void PyMutex_Init(PyMutex *mutex)
{
    *mutex = PyMutex_STATIC_INIT;
}

static PyMutex mutex = PyMutex_STATIC_INIT;

PyObject* new_object(void)
{
    MyObject *obj = PyObject_GC_New(MyObject, MyType);
    // obj->mutex = PyMutex_STATIC_INIT;
    PyMutex_Init(&obj->mutex);
    return (PyObject*)obj;
}

@vstinner
Copy link

typedef struct PyMutex { ... } PyMutex;
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }

Can you please expand the ...? I also would like to decide on the implementation. For example, if the only PyMutex member is public or not.

@colesbury
Copy link
Author

@colesbury
Copy link
Author

The below suggested pattern will not compile in MSVC. It relies on a GCC extension.

#define PyMutex_STATIC_INIT (PyMutex){0}
static PyMutex mutex = PyMutex_STATIC_INIT;

@encukou
Copy link
Collaborator

encukou commented May 1, 2024

I don't think API stability is an issue. Changing the PyMutex size will break ABI, not API (i.e. we can do it a new feature release).

AFAIK, the dynamic-allocation API for locking is PyThread_allocate_lock etc. (Sadly undocumented, but public & stable.)

Perhaps we'll get another non-C-language-friendly, ABI-stable mutex as free-threading matures. (Maybe as a VM-managed part of a PyObject to make dynamic allocation worthwhile, with lock and unlock API on the object.)

This proposal is exposing the low-level machinery, for cases where you're counting bits and instructions.
Would making it PyUnstable_ work for everyone?

@erlend-aasland
Copy link

Note also that the size of the mutex is included in the specification of the accepted PEP 703.

@colesbury
Copy link
Author

Yes, the design comes from the WebKit lock linked to by @zooba. It was added in Aug 2015. The major change since then was in July 2016 when it became "eventually fair", which entailed going from three states (two bits in one byte) to four states (still two bits in one byte); the major implementation changes happen at a lower levels of the API.

I'm still not exactly sure how the C API WG is intended to operate. I think it'd be helpful if the WG as a whole decided what it wanted from this API.

@zooba
Copy link

zooba commented May 23, 2024

I'm still not exactly sure how the C API WG is intended to operate.

We go for consensus, and when we give up on that we vote.

In general, when I'm pushing for future-proof APIs like this, I eventually get outvoted 😉 So if you insist on sticking to the current design, you probably just have to hold on until the other members give up and call for a vote. This discussion has gone on pretty long (and I've repeated my position 3-4 times already), so I expect that won't be long, if only because nobody wants to re-read the whole thing anymore.

@encukou
Copy link
Collaborator

encukou commented May 31, 2024

Let's say that the struct can grow to up to sizeof(void *) in future CPython feature releases. Then non-C languages can either reserve pointer-sized space, or commit to checking/updating the size for each Python version.

AFAICS, PyMutex_Init and PyMutex_Destroy are easy enough to add now as they'd be no-ops, but it's a bit tricky to specify future-proof semantics. I assume they'd be thread-safe and idempotent, and calling Init after Destroy would be allowed?
PyMutex_Destroy sounds doable for dynamically allocated mutexes, but we shouldn't require it for static ones (i.e. we'd promise future Pythons won't start showing warnings if you leave it out).
Asserting that a mutex is unlocked when you deallocate it might be good for debug builds?

Docs for “both the address and the value” can be added. Some non-C languages call this “pinning” (rust, golang).

@erlend-aasland
Copy link

I'm fine with the 10 year old, battle tested one-byte design of WTF locks. The size of PyMutex is already given by PEP-703, which is approved by the SC. As I see it, the size is not up for debate.

Quoting Petr, from early in the discussion:

There's some mind-blowingly genius engineering behind the "one-byte struct and two functions" API surface. But the idea is proven; it's been used in WebKit since 2015. I don't think it'll need to change any time soon.

Consider me +1 for adding the API.

Perhaps there is a naming issue? Would PyStaticMutex be better?

@gvanrossum
Copy link

I'm fine with the 10 year old, battle tested one-byte design of WTF locks. The size of PyMutex is already given by PEP-703, which is approved by the SC. As I see it, the size is not up for debate.

Agreed.

@zooba
Copy link

zooba commented May 31, 2024

The size of PyMutex is already given by PEP-703, which is approved by the SC. As I see it, the size is not up for debate.

The PyMutex shown in PEP 703 is not the one being discussed here, because nowhere does PEP 703 specify this public API.

If this API is required for users to directly use the mutex embedded in the object structure, then they need to be the same. I asked earlier whether this was the case and was told no: they are independent, but making the private API public is just convenient. So PEP 703 doesn't force us into anything here.

Let's say that the struct can grow to up to sizeof(void *) in future CPython feature releases. Then non-C languages can either reserve pointer-sized space, or commit to checking/updating the size for each Python version.

I'm okay with this.

@erlend-aasland
Copy link

erlend-aasland commented Jun 1, 2024

If this API is required for users to directly use the mutex embedded in the object structure, then they need to be the same. I asked earlier whether this was the case and was told no: they are independent, but making the private API public is just convenient. So PEP 703 doesn't force us into anything here.

This is not how I interpret any of Sam's reponses; I won't reiterate them, as they are all easily found in this discussion, and I think each of them is a good (re)read. I'm 100% aligned with Sam's argumentation, and I think it would be a really bad idea to change the API in any way, including making it dynamic, and/or changing the size of the mutex. I'm fine with:

  • making it an unstable API (suggested by Petr in an earlier post)
  • somehow add Static to the name, to make it clear what kind of API this is
  • add an init macro a la the one Sam suggested: #define PyMutex_STATIC_INIT { 0 }

@vstinner
Copy link

vstinner commented Jun 3, 2024

I'm fine with adding a PyMutex_Destroy() function as a static inline function which does nothing.

@zooba
Copy link

zooba commented Jun 3, 2024

  • making it an unstable API (suggested by Petr in an earlier post)
  • somehow add Static to the name, to make it clear what kind of API this is
  • add an init macro a la the one Sam suggested: #define PyMutex_STATIC_INIT { 0 }

I don't think any of these are necessary. I have no problem committing to support the API over multiple releases, provided it gets a bit of buffer space, adding "Static" to names doesn't make anything clearer in my opinion, and saying "initialize to zero" is sufficiently future-proof (provided we can detect an uninitialised mutex on first use, which it sounds like we can, because our internal hash set won't have an entry).

a PyMutex_Destroy() function as a static inline function which does nothing

I think there's a valuable use for a Destroy function to release and fail anyone still waiting on the mutex, which also helps ensure you are safely destroying the mutex (or alternatively, fail to destroy if there are any waiters, which is probably almost as good). But I'm more concerned that we have the API available so it can be implemented without breaking existing users than that it does anything helpful on first release.

@vstinner
Copy link

vstinner commented Jun 3, 2024

If I understood correctly, the proposed API would look like:

typedef struct PyMutex { ... } PyMutex;
#define PyMutex_STATIC_INIT ...
static inline void PyMutex_Init(PyMutex *m) { ... }
static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }
static inline void PyMutex_Destroy(PyMutex *m) { ... }

Did I miss anything? I'm fine with this API. I don't worry much about the ... ("implementation details").

@colesbury
Copy link
Author

If you want to let users of the API assert that no one is waiting on the mutex, we should make int PyMutex_IsLocked(PyMutex *m) public. We use this in a number of places internally. The choice of when to check (debug vs. release) should be up to the user of the API.

I think that PyMutex_Destroy is not a good addition:

  • It doesn't do what the name suggests -- there is nothing to destroy and the primary suggested purpose is something else (asserting that there are no waiters).
  • It does not let you implement a non-trivial destroy in the future without breaking users -- you would almost certainly break existing users in weird edge cases, like after fork.

@zooba
Copy link

zooba commented Jun 3, 2024

  • It does not let you implement a non-trivial destroy in the future without breaking users

Yeah, I realised this as well.

IsLocked is generally considered a bad API for synchronisation primitives, isn't it? I'm sure I recall hearing that somewhere, though I can't be sure now. I guess what we ought to do is assert that a mutex is held when releasing it (even if that "release" is practically just deallocating the memory around it).

I'm sure we get that right for our internal uses, but it's far more likely that users will use this in different ways and likely get it wrong. A Destroy that requires the mutex be held and makes it invalid for future use is definitely going to help users figure out their bugs.

the proposed API would look like

I'd like to see error results as well, not just void. We haven't settled on our guidelines yet, but one of the ones we all liked was that we should start with error codes on APIs, even if they "can never fail". They can all be return 0, and if they're inlined then there'll be no cost at runtime, but it also reserves options.

@vstinner
Copy link

vstinner commented Jun 3, 2024

we all liked was that we should start with error codes on APIs, even if they "can never fail"

I dislike adding an error code path when it's possible to avoid it. I prefer to return void if the function "cannot fail".

I'm not sure which errors you're referring to. Which functions can/should be able to fail?

@colesbury
Copy link
Author

IsLocked is generally considered a bad API for synchronisation primitives, isn't it?

It's not useful for making control flow decisions, but it's useful for assertions. Both "assert that the mutex is not locked" and "assert that this mutex is already locked" (i.e., by a caller higher up the stack), which is particularly helpful for catching bugs when adding locking to make existing code thread-safe.

I'd like to see error results as well, not just void.

Adding error codes complicates the use of the API to the point where callers are best served by just ignoring them. How are you supposed to handle a failure of PyMutex_Unlock()? There are many contexts in which PyMutex_Lock() failures cannot be handled either, like if you need to lock a mutex in some clean-up path.

The only possible failures are due to incorrect usage, like unlocking a mutex that is not locked, but returning an error code in that case isn't a good design because they are likely to be ignored.

if they're inlined then there'll be no cost at runtime...

There's still a runtime cost if callers feel the need to write something like: https://github.com/python/cpython/blob/153b118b78588209850cc2a4cbc977f193a3ab6e/Python/ceval_gil.c#L122-L124

@zooba
Copy link

zooba commented Jun 3, 2024

I prefer to return void if the function "cannot fail".

Yes, we had that discussion. I guess in our next in-person we'll argue this one out some more, but I'm very much with @encukou on this. APIs that can have error results, should have error results.

There are many contexts in which PyMutex_Lock() failures cannot be handled either, like if you need to lock a mutex in some clean-up path.

The fact that some situations cannot be handled doesn't invalidate all the ones where they can be handled. A public API is going to be used in a lot of ways we can't predict - we should leave it up to users to know whether they can or can't handle it in whatever context they're in. All we can really predict is that people will indeed use it incorrectly, and so failures are possible, and it will help people if we report them.

Also, handling doesn't necessarily mean recovering. It might mean asserting with some application-specific assert functionality. How do you intend for users to find bugs in their PyMutex_Unlock calls when there's no feedback available?

There's still a runtime cost if callers feel the need to write something like:

Any optimiser that can't figure out that a literal return 0 means that code will never run and simply remove it isn't going to optimise well enough to worry about runtime cost. If we can't use a literal return 0, then it means an error is actually possible and the code is not wasted. We can't stop people ignoring the error code, but if they decide that it's worth it to ignore the error and assume it'll always succeed, at least we gave them the option. If we decide that for them, we're taking away options.

@colesbury
Copy link
Author

How do you intend for users to find bugs in their PyMutex_Unlock calls when there's no feedback available?

There is feedback -- it fails with a PyFatal_Error() because it's clearly an error in usage that's not recoverable: https://github.com/python/cpython/blob/4e8aa32245e2d72bf558b711ccdbcee594347615/Python/lock.c#L189

Any optimiser that can't figure out that a literal return 0

That only works if the slow path functions return void. If they return an error code, even if it's always zero, than the condition can't be optimized away.

@colesbury
Copy link
Author

we should leave it up to users to know whether they can or can't handle it in whatever context they're in...

Users cannot handle errors if you do not describe the error conditions. What are the error conditions for PyMutex_Lock() or PyMutex_Unlock()?

We should not try to make the APIs that are as general as possible. We should aim for APIs that are easy to use and hard to misuse. Vaguely described error conditions that never happen in practice make the API harder to use correctly.

@zooba
Copy link

zooba commented Jun 3, 2024

That only works if the slow path functions return void.

If there's genuinely no errors possible (as claimed for the current implementation), then these can continue to return void. My concern is about what we tell consumers to write, so that if/when the implementations change, their code still works without them having to rewrite it (but possibly they have to recompile).

What are the error conditions for PyMutex_Lock() or PyMutex_Unlock()?

"Not a mutex" or "not locked", though I appreciate that the first is not distinguishable with this implementation. But errors are not difficult to identify.

We should not try to make the APIs that are as general as possible. We should aim for APIs that are easy to use and hard to misuse.

Clearly we just disagree on how to interpret this. There's a lot of generality I'm not asking for, and what I am asking is not expanding the scope of the APIs at all. I just want to ensure that users who use our public APIs won't feel screwed over if we later change the implementation. We've jumped through so many hoops to avoid changing APIs like Py_INCREF and PyMem_Malloc while changing their implementations that I don't think it's unreasonable to predict that one day we might need to make changes to this API as well.

@erlend-aasland
Copy link

We've jumped through so many hoops to avoid changing APIs like Py_INCREF and PyMem_Malloc while changing their implementations that I don't think it's unreasonable to predict that one day we might need to make changes to this API as well.

We're talking about a dead simple static one-byte mutex API; comparing it with something as complex as Py_INCREF, which is tightly coupled to intricate parts of the VM including the GC, is comparing apples with a beef stroganoff. API design is not about applying a blanket of rigid rules; API design is weighing a lot of different factors. For this field proven API, I am totally fine with discarding the low probability that any static mutex API will ever need to change. This implies I'm fine with no error reporting; This is a good example of an infallible API. The fatal error by API misuse is fine.

All participants in this thread, including Steve, has acknowledged there is a vanishing probability that this API will ever need to change. Why weigh it heavier than the rest of the API design factors?

To quote the infamous zen, just for fun: practicality beats purity, simple is better than complex.

@vstinner
Copy link

vstinner commented Jun 3, 2024

Before the C API Working Group was created, many APIs were added without any design for many years. Some were more lucky and tried to avoid some issues and were reviewed by a few people. At the end, the C API exists, has a few flaws but it's not the end of the world.

Now I see the C API Working Group as a gatekeeper preventing other people from changing the C API. It's good to have more eyes on changes, to prevent known issues. But there it seems like the discussion goes nowhere and Sam is just blocked by the Working Group. I don't think that the discussion is still constructive but I don't know how to move on.

We are not going to solve all issues suddenly at once. We can try to avoid the most issues pitfals. But here I feel like we are exploring an hypothetical future of an API which doesn't already exist.

Let's allow us to make mistakes.

If the API looks fragile, we can use the "provisional" statuscin the doc. The API targets free threading which is already experimental, no? Early adopters would not be surprised if the API changes.

@erlend-aasland
Copy link

Thanks Victor. I suggest we go for the unstable API tier and Sam's design. That is probably the best way to move forward in order to please everyone.

@gvanrossum
Copy link

gvanrossum commented Jun 3, 2024 via email

@encukou
Copy link
Collaborator

encukou commented Jun 4, 2024

API design is not about applying a blanket of rigid rules; API design is weighing a lot of different factors.

Yup. The rules are very useful for everyday API design, but each comes with reasoning and trade-offs we need to consider if we want to break them.

  • A minimal PyMutex struct with public size means if that if we ever do need more space, we'll need a whole new API.
  • No error reporting means that we can't deprecate this API with a DeprecationWarning (which -Werror would turns into an exception)

Both are a problem for the stable ABI, which means that if we add PyMutex as proposed here, we might need to come up with something different for the stable ABI later.
Ideally, we'd like to only add API that we can promote to stable ABI as-is. But I can't find a good way do that here.

We're talking about a dead simple static one-byte mutex API; comparing it with something as complex as Py_INCREF, which is tightly coupled to intricate parts of the VM including the GC, is comparing apples with a beef stroganoff.

IMO, PyMutex, with its atomic parking lot hashtable hiding behind a two-bit struct and two infallible functions, is actually very similar to Py_INCREF in this respect. And it deserves similar treatment.

All participants in this thread, including Steve, has acknowledged there is a vanishing probability that this API will ever need to change.

I don't share that optimism; I think we will want to change or remove it in the future. When that happens, it'll be messy for us and mean extra work for users.
But, my gut feeling is that we'll run into unknown issues, ones that we can't solve now with the usual future-proofing techniques. So, I still think we should add the API mostly as proposed, with minor tweaks.
Out of the future-proofing suggestions in this thread, the ones I think would help are:

  • adding a Destroy function to be called before freeing a mutex (for mutexes that are dynamically allocated -- for static ones there'd be no clean-up)
  • allowing the struct to grow -- but IMO that's only a concern for the stable ABI so it can be done later (specifically, non-C languages that use non-limited API have this issue solved)

Sam, sorry that you got caught in the cross-fire. (There's a reason Steering Council meetings are private...)

Steve, thanks for brainstorming; your ideas are valuable even if (if!) they don't make it.

@encukou
Copy link
Collaborator

encukou commented Jun 7, 2024

The C API Working Group met to discuss this proposal and has decided to accept the API as shown in the original post with the following clarifications:

  • The size of a PyMutex struct should be documented as unstable, meaning that while it is currently 1 byte, the size may be changed in future major releases without a deprecation period.
  • The m argument should be documented as requiring a fixed address that points to writable memory. This is intentionally more direct wording than in the current proposal.

There is no need to artificially increase the size of PyMutex, to provide initialization or destruction functions or macros, or error return values. These should only be minor additions to the documentation as it stands in PR #117731.

We would like to see the inline functions also exported as regular functions, in line with our desire to see the API be usable without relying on macros. This would be required for Limited API inclusion, but is nice to have regardless. In our view, exported functions do not constrain future API changes any more than inline functions.

A fresh review by the WG will be required before these may be added to the Limited API.

— Petr, on befalf of the WG

@gvanrossum
Copy link

(Clarification: by "major releases" we mean "feature releases", IOW the size might change in 3.15.)

@colesbury
Copy link
Author

I will work on the PR for PyMutex this week (hopefully today).

@vstinner
Copy link

The C API Working Group met to discuss this proposal and has decided to accept the API as shown in the original post with the following clarifications: (...)

@colesbury is now working on updating his PR. IMO we can now close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants