Make some `PyMutex` functions public #117511

colesbury · 2024-04-03T17:46:37Z

Feature or enhancement

Overview

The PyMutex APIs are currently internal-only in pycore_lock.h. This proposes making the type and two functions public in as part of the general, non-limited API in Include/cpython/lock.h.

The APIs to be included in the public header are:

typedef struct PyMutex {
    uint8_t v;
} PyMutex;

static inline void PyMutex_Lock(PyMutex *m) { ... }
static inline void PyMutex_Unlock(PyMutex *m) { ... }

// (private) slow path for locking the mutex
PyAPI_FUNC(void) _PyMutex_LockSlow(PyMutex *m);

// (private) slow path for unlocking the mutex
PyAPI_FUNC(void) _PyMutex_UnlockSlow(PyMutex *m);

Motivation

With the free-threaded build, C API extensions are more likely to require locking for thread-safety
The PyMutex API is easier to use correctly than the existing PyThread_type_lock. The PyMutex APIs release the GIL before blocking, whereas with the PyThread_type_lock APIs you often want to use a two step approach to locking to release the GIL before blocking.
The PyMutex API enables a more efficient implementation: it's faster to acquire when the lock is not contended and does not require allocation.

C-API WG issue

Make PyMutex public in the non-limited API capi-workgroup/decisions#22

cc @ngoldbaum

Linked PRs

The text was updated successfully, but these errors were encountered:

da-woods · 2024-04-03T18:55:47Z

What's the private slow path for? I'm a bit puzzled why you'd ever choose that?

colesbury · 2024-04-03T19:22:13Z

The PyMutex_Lock and PyMutex_Unlock are static inline functions, but really only want to inline the simple fast-path case when the mutex is not contended. The slow path functions exist for PyMutex_Lock and PyMutex_Unlock to call.

cpython/Include/internal/pycore_lock.h

Lines 70 to 87 in c43f6a4

    
           static inline void 
        
           PyMutex_Lock(PyMutex *m) 
        
           { 
        
               uint8_t expected = _Py_UNLOCKED; 
        
               if (!_Py_atomic_compare_exchange_uint8(&m->v, &expected, _Py_LOCKED)) { 
        
                   _PyMutex_LockSlow(m); 
        
               } 
        
           } 
        
           // Unlocks the mutex. 
        
           static inline void 
        
           PyMutex_Unlock(PyMutex *m) 
        
           { 
        
               uint8_t expected = _Py_LOCKED; 
        
               if (!_Py_atomic_compare_exchange_uint8(&m->v, &expected, _Py_UNLOCKED)) { 
        
                   _PyMutex_UnlockSlow(m); 
        
               } 
        
           }

ngoldbaum · 2024-04-03T19:31:14Z

There was a discussion about this late last year on discourse: https://discuss.python.org/t/exposing-atomic-operations-tooling-in-the-public-c-api/39973

I’m now looking at adding support for free-threaded python in NumPy. My current plan is to use PyThreadType_lock or when that is not performant enough, look at using platform-specific synchronization primitives or e.g. C11 threads.h.

I think just using PyMutex instead would be better since it’s a simple API and is fast enough for many use cases. I understand there’s a desire to avoid maintaining a suboptimal API forever, but I think CPython providing a migration story for C extension authors would save a lot of time distributed across the community.

vstinner · 2024-04-03T20:39:37Z

typedef struct _PyMutex { uint8_t v; } PyMutex;

Would it be possible to allocate it on the heap memory, and make the structure members opaque?

colesbury · 2024-04-03T20:59:59Z

A heap allocated mutex is less useful and harder to use. For example:

Sometimes you need a global lock. You can't do a heap allocation when you declare a global variable in C. If you do the allocation lazily before the PyMutex_Lock() it's too late -- now you need a lock to protect your lock initialization.
Embedding a PyMutex is cheap, but embedding a heap allocated mutex in a small object can be prohibitive.

I don't think making it opaque and heap allocated provides any concrete advantages in this case. Oftentimes, opaqueness gives us more flexibility in adjusting the implementation, but in this case it restricts the implementation in ways that matter.

vstinner · 2024-04-04T09:08:13Z

My concern is that tomorrow we might find more efficient mutex and need new more specific mutex (recursive, "adaptative", "error check", etc.), and we might be blocked by the structure which can no longer be modified.

There are two aspects: API compatibility (we can change the structure) and ABI compatibility (stable ABI, we cannot change the structure if it's exposed).

Maybe we can support the stable ABI by not exposing the structure in the limited C API, and only support heap allocation for the limited C API.

The Linux ptread mutex implementation supports heap allocation and static initialization, and the structure is opaque. Extract:

#ifdef __x86_64__
# if __WORDSIZE == 64
#  define __SIZEOF_PTHREAD_MUTEX_T 40
# else
#  define __SIZEOF_PTHREAD_MUTEX_T 32
# endif
#else
# define __SIZEOF_PTHREAD_MUTEX_T 24
#endif

typedef union
{
  struct __pthread_mutex_s __data;
  char __size[__SIZEOF_PTHREAD_MUTEX_T];
  long int __align;
} pthread_mutex_t;

#define PTHREAD_MUTEX_INITIALIZER \
 { {  __PTHREAD_MUTEX_INITIALIZER (PTHREAD_MUTEX_TIMED_NP) } }
#ifdef __USE_GNU
# define PTHREAD_RECURSIVE_MUTEX_INITIALIZER_NP \
 { {  __PTHREAD_MUTEX_INITIALIZER (PTHREAD_MUTEX_RECURSIVE_NP) } }
# define PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP \
 { {  __PTHREAD_MUTEX_INITIALIZER (PTHREAD_MUTEX_ERRORCHECK_NP) } }
# define PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP \
 { {  __PTHREAD_MUTEX_INITIALIZER (PTHREAD_MUTEX_ADAPTIVE_NP) } }
#endif

// ---- Internal ---------------------

struct __pthread_mutex_s
{
  int __lock;
  unsigned int __count;
  int __owner;
#ifdef __x86_64__
  unsigned int __nusers;
#endif
  /* KIND must stay at this position in the structure to maintain
     binary compatibility with static initializers.  */
  int __kind;
#ifdef __x86_64__
  short __spins;
  short __elision;
  __pthread_list_t __list;
# define __PTHREAD_MUTEX_HAVE_PREV      1
#else
  unsigned int __nusers;
  __extension__ union
  {
    struct
    {
      short __espins;
      short __eelision;
# define __spins __elision_data.__espins
# define __elision __elision_data.__eelision
    } __elision_data;
    __pthread_slist_t __list;
  };
# define __PTHREAD_MUTEX_HAVE_PREV      0
#endif
};

#ifdef __x86_64__
# define __PTHREAD_MUTEX_INITIALIZER(__kind) \
  0, 0, 0, 0, __kind, 0, 0, { 0, 0 }
#else
# define __PTHREAD_MUTEX_INITIALIZER(__kind) \
  0, 0, 0, __kind, 0, { { 0, 0 } }
#endif

I see that pthread_mutex contains way more information: lock, owner, nusers, spins, elision, and are single linked list.

Maybe the list is used to solve the hard problem is keeping locks consistent at fork: see issue gh-50970. Do you plan to solve this problem or leave it aside?

colesbury · 2024-04-04T15:21:55Z

The mutex is already adaptive, both in the sense that it dynamically switches from spinning to blocking and from unfair (barging) to fair (handoff). We should not shoehorn a recursive mutex into PyMutex -- a recursive mutex should have it's own data type -- but that would be feasible too. The primary means of extensibility is the wait entry and that's private and we can add or remove fields as desired.

Linux's pthread mutex isn't really any more opaque than this ¹: note that struct __pthread_mutex_s is embedded in pthread_mutex_t. It's not an opaque pointer. We could do the same union, but it just adds unnecessary complexity without any flexibility. Linux (glibc) is stuck with 40 byte mutexes on x86-64. If we made this part of the stable ABI, which is not part of this proposal, than we'd be stuck with 1 byte PyMutex. If you can do the same things with 1 byte as with 40 bytes, than 1 byte is better.

We already clear the wait entries at fork, which is better than the pthreads implementations. With pthreads, if you unlock a mutex you own after a fork, you might still deadlock due to trying to wake up dead threads. We don't have that problem with the PyMutex implementation because the wait entries are kept separately. But there's still no magic way to solve locks + forks because it's not enough to keep the locks consistent -- you really want to keep the data protected by the locks consistent too.

If you #include <pthread.h> you get the full definition of struct __pthread_mutex_s. It's private (not documented), but not "internal" in the way that our pycore_xxx.h headers are internal. ↩

erlend-aasland · 2024-04-04T19:04:13Z

Would it be possible to allocate it on the heap memory [...]

This issue is about exposing the existing mutex API, not about designing a new kind of mutex API.

Sam, you have a big +1 from me for exposing the existing PyMutex API.

vstinner · 2024-04-05T08:34:36Z

I'm fine with adding proposed API, but not to added it the limited C API.

I suggest to rename struct _PyMutex to struct PyMutex.

The limited C API (and stable ABI) would need an alloc/free API similar to PyThread_allocate_lock() / PyThread_free_lock(), hide the structure, and expose PyMutex_Lock() and PyMutex_Unlock() in the limited C API (but still implement them as static inline in the public C API). Pseudo-API for an hypothetical limited C API:

struct PyMutex PyMutex;
PyMutex *PyMutex_Alloc(void);
void PyMutex_Lock(PyMutex *mutex);
void PyMutex_Unlock(PyMutex *mutex);
void PyMutex_Free(PyMutex *mutex);

No public structure member.
No static inline function or macro, only opaque function calls.

An alternative is to reuse the PyThread API to expose PyMutex there in a stable ABI.

dpdani · 2024-04-19T12:41:48Z

What about adding an initialization API (albeit trivial)?

Right now, this is how I'm initializing a PyMutex, and here how it's currently done in main for dict.

Of course it's trivial to set one field to 0, but it sounds like an implementation detail a programmer shouldn't need to worry about.
Would it make sense to also add a public init routine?

colesbury · 2024-04-19T12:49:15Z

No, I think an init routine just adds complexity. There are standard ways to zero (or default-value) initialize a structure in C and C++ that do not require knowledge of the fields:

PyMutex m = {};  // C23 and C++11

static PyMutex;

PyMutex m = {0};  // C99 and C11 idiom. Note: zero-initializes all fields

self->mutex = (PyMutex){0};

dpdani · 2024-04-19T17:49:48Z

I can't really see how it adds complexity, but that might be my own short sightedness.
Do you mean that making the call would be computationally more costly for a routine that will be called once per object allocation?

If not for that, it seems to me more future-proof to have an init routine to call, in case at some point whatever different value is needed at initialization time for the mutex to function.
(e.g. some flag bits or whatever may be found to be useful)

colesbury · 2024-04-19T18:58:11Z

It adds API complexity (not computation complexity). It is redundant with standard idioms for zero initialization. It doesn't add real flexibility. If you allow zero-initialization, people will rely on it, so you can't change the representation without breaking backwards compatibility even if you also provide an initialization APIs that does the same thing.

You would also need at least two APIs: one for static initialization and one for dynamically allocated objects.

…17731) (cherry picked from commit 3af7263)

…ythonGH-117731) (cherry picked from commit 3af7263) Co-authored-by: Sam Gross <colesbury@gmail.com>

…) (#120800) (cherry picked from commit 3af7263)

…17731)

colesbury added type-feature A feature request or enhancement topic-free-threading labels Apr 3, 2024

rgommers mentioned this issue Apr 10, 2024

Availability of PyMutex to implement locking for thread-safety Quansight-Labs/free-threaded-compatibility#6

Closed

colesbury added a commit to colesbury/cpython that referenced this issue Apr 10, 2024

pythongh-117511: Make PyMutex public in the non-limited API

2e0695f

colesbury added a commit to colesbury/cpython that referenced this issue Apr 10, 2024

pythongh-117511: Make PyMutex public in the non-limited API

a22a7b6

colesbury added a commit to colesbury/cpython that referenced this issue Apr 10, 2024

pythongh-117511: Make PyMutex public in the non-limited API

dc64786

colesbury added a commit to colesbury/cpython that referenced this issue Apr 10, 2024

pythongh-117511: Make PyMutex public in the non-limited API

3c743f8

bedevere-app bot mentioned this issue Apr 10, 2024

gh-117511: Make PyMutex public in the non-limited API #117731

Merged

colesbury mentioned this issue Apr 10, 2024

Make PyMutex public in the non-limited API capi-workgroup/decisions#22

Closed

colesbury added a commit to colesbury/cpython that referenced this issue Jun 17, 2024

Merge branch 'main' into pythongh-117511-pymutex

4ebc5e5

colesbury added a commit that referenced this issue Jun 20, 2024

gh-117511: Make PyMutex public in the non-limited API (#117731)

3af7263

colesbury added a commit to colesbury/cpython that referenced this issue Jun 20, 2024

pythongh-117511: Make PyMutex public in the non-limited API (python#1…

b75dff5

…17731) (cherry picked from commit 3af7263)

colesbury added a commit to colesbury/cpython that referenced this issue Jun 20, 2024

[3.13] pythongh-117511: Make PyMutex public in the non-limited API (p…

74c5b83

…ythonGH-117731) (cherry picked from commit 3af7263) Co-authored-by: Sam Gross <colesbury@gmail.com>

colesbury added a commit to colesbury/cpython that referenced this issue Jun 20, 2024

[3.13] pythongh-117511: Make PyMutex public in the non-limited API (p…

84a639b

…ythonGH-117731) (cherry picked from commit 3af7263) Co-authored-by: Sam Gross <colesbury@gmail.com>

bedevere-app bot mentioned this issue Jun 20, 2024

[3.13] gh-117511: Make PyMutex public in the non-limited API (GH-117731) #120800

Merged

colesbury added a commit that referenced this issue Jun 20, 2024

[3.13] gh-117511: Make PyMutex public in the non-limited API (GH-117731…

3cb6c4c

…) (#120800) (cherry picked from commit 3af7263)

colesbury closed this as completed Jun 25, 2024

mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024

pythongh-117511: Make PyMutex public in the non-limited API (python#1…

7c7eec3

…17731)

aisk mentioned this issue Jul 2, 2024

gh-116181: Remove Py_BUILD_CORE_MODULE and Py_BUILD_CORE_MODULE in rotatingtree.c #121260

Merged

noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024

pythongh-117511: Make PyMutex public in the non-limited API (python#1…

81adac2

…17731)

estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024

pythongh-117511: Make PyMutex public in the non-limited API (python#1…

f85752b

…17731)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make some `PyMutex` functions public #117511

Make some `PyMutex` functions public #117511

colesbury commented Apr 3, 2024 •

edited by bedevere-app bot

Loading

da-woods commented Apr 3, 2024

colesbury commented Apr 3, 2024

ngoldbaum commented Apr 3, 2024 •

edited

Loading

vstinner commented Apr 3, 2024

colesbury commented Apr 3, 2024 •

edited

Loading

vstinner commented Apr 4, 2024

colesbury commented Apr 4, 2024 •

edited

Loading

erlend-aasland commented Apr 4, 2024

vstinner commented Apr 5, 2024

dpdani commented Apr 19, 2024

colesbury commented Apr 19, 2024 •

edited

Loading

dpdani commented Apr 19, 2024

colesbury commented Apr 19, 2024 •

edited

Loading

Make some PyMutex functions public #117511

Make some PyMutex functions public #117511

Comments

colesbury commented Apr 3, 2024 • edited by bedevere-app bot Loading

Feature or enhancement

Overview

Motivation

C-API WG issue

Linked PRs

da-woods commented Apr 3, 2024

colesbury commented Apr 3, 2024

ngoldbaum commented Apr 3, 2024 • edited Loading

vstinner commented Apr 3, 2024

colesbury commented Apr 3, 2024 • edited Loading

vstinner commented Apr 4, 2024

colesbury commented Apr 4, 2024 • edited Loading

Footnotes

erlend-aasland commented Apr 4, 2024

vstinner commented Apr 5, 2024

dpdani commented Apr 19, 2024

colesbury commented Apr 19, 2024 • edited Loading

dpdani commented Apr 19, 2024

colesbury commented Apr 19, 2024 • edited Loading

Make some `PyMutex` functions public #117511

Make some `PyMutex` functions public #117511

colesbury commented Apr 3, 2024 •

edited by bedevere-app bot

Loading

ngoldbaum commented Apr 3, 2024 •

edited

Loading

colesbury commented Apr 3, 2024 •

edited

Loading

colesbury commented Apr 4, 2024 •

edited

Loading

colesbury commented Apr 19, 2024 •

edited

Loading

colesbury commented Apr 19, 2024 •

edited

Loading