gh-125997: ensure that `time.sleep(0)` is not delayed on non-Windows platforms #128274

picnixz · 2024-12-26T12:32:00Z

This is a suggestion for fixing the issue when the timeout is 0. For other timeouts, this does not change the current behaviour (I'm tempted removing the do-while loop as we should never have an issue, but I'm not familar with the monotonic C implementation so I left it as is).

Issue: time.sleep(0) is slower on Python 3.11 than on Python 3.10 #125997

On non-Windows platforms, this reverts the usage of `clock_nanosleep` and `nanosleep` introduced by 85a4748 and 7834ff2 respectively, falling back to a `select(2)` alternative instead.

Modules/timemodule.c

rruuaanng · 2024-12-27T11:09:55Z

@picnixz Unfortunately, we should seriously review :(. It seems that an assertion failed somewhere.

Assertion failed: (errno == 0), function pysleep, file timemodule.c, line 2220.
Fatal Python error: Aborted

Thread 0x0000000173bbb000 (most recent call first):
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/subprocess.py", line 1918 in _execute_child
...

picnixz · 2024-12-27T11:13:39Z

Ah I think I know what happens. When we raise an OSError from errno we don't clear the errno. So if you chain calls, you need to just ignore them. I'm removing the assertion cc @ZeroIntensity

Due to how `OSError` are raised from `errno`, we do not clear `errno` afterwards. If we catch `OSError`, then we still have an errno set, and if we call `time.sleep()` just after, we may have `errno != 0` (but we know we handled it so it's fine).

rruuaanng · 2024-12-27T11:19:30Z

If pytime is running on multiple cores, I recommend using _LIBC_REENTRANT, which will make your errno independent in each thread.

raw defined

#  if !defined _LIBC || defined _LIBC_REENTRANT
/* When using threads, errno is a per-thread value.  */
#   define errno (*__errno_location ())
#  endif
# endif /* !__ASSEMBLER__ */
#endif /* _ERRNO_H */

picnixz · 2024-12-27T11:22:16Z

This wouldn't solve the issue (also because _LIBC_REENTRANT is a private macro and something you supply when configuring libc). By the way, errno is thread-safe (see https://linux.die.net/man/3/errno). The issue is simply that we can raise an OSError from errno, which doesn't clear errno, and do something else afterwards. I will keep the current behaviour as it was done beforehand.

ZeroIntensity · 2024-12-27T11:55:23Z

TIL! Should we assert !PyErr_Occurred() instead?

picnixz · 2024-12-27T12:06:24Z

TIL! Should we assert !PyErr_Occurred() instead?

Not sure. It's not really an issue and the function is internal and only for time.sleep(). It's only used once (it's just that time.sleep() calls this function). Though it wouldn't hurt (if this assertion fails, then there is a separate issue)

ZeroIntensity · 2024-12-27T12:34:08Z

I think we should. There's no real cost to adding it, and it would be helpful for debugging in the rare case that something goes wrong.

charles-cooper · 2025-01-02T15:33:58Z

I've locally selected the select() alternative, but considering your comment on the GIL, I may either release the GIL or drop this idea. What I can however keep from my PR (that I've still not updated) is the fact that emulating time.sleep(0) can be achieved using select.poll().poll(0) if one does not want to relinquish the CPU or via os.sched_yield() if one wants to relinquish the CPU.

i have not looked into this too deeply but my understanding is the poll() will also relinquish the CPU since it is a syscall but not necessarily give up the time slice (e.g. if there are no threads with higher priority)

In addition, I didn't check yet but I'm wondering whether calling clock_nanosleep() without calling PyTime_Monotonic would actually solve the issue of sleeping using a TIMER_ABSTIME flag. The reason is that clock_nanosleep() should not suspend the calling thread in this case:

If flags is TIMER_ABSTIME, then t is interpreted as an absolute
time as measured by the clock, clockid. If t is less than or
equal to the current value of the clock, then clock_nanosleep()
returns immediately without suspending the calling thread.

this looks like a good idea, might be the cleanest - just choose TIMER_ABSTIME in the case that the argument to time.sleep() is <=0.

as a user, i would just generally expect that sleep() issues a syscall (i.e. relinquish enough control to the kernel and cpython runtime that they can unblock any I/O or other operations that need to be scheduled) but don't really care exactly which one.

picnixz · 2025-01-02T15:39:48Z

I'll commit something tomorrow or the day after using that approach. It would also be nice to avoid the (small) overhead of using PyTime_Monotonic().

picnixz · 2025-01-03T08:49:58Z

While we're still above the select() alternative, we're now a bit faster and the delays are less pronounced: Mean +- std dev: 2.09 us +- 0.24 us.

For comparison, the current implementation performances 25x worse: Mean +- std dev: 54.8 us +- 0.8 us, so we're definitely seeing an improvement using the optimized version.

I'll update the comments, docs and tests and push those changes.

FTR, the numbers above are in a DEBUG build, but this doesn't change much as on an optimized build, current implementation takes roughly the same amount of time.

Lib/test/test_time.py

Modules/timemodule.c

vstinner · 2025-01-06T10:24:17Z

Doc/whatsnew/3.14.rst

+time
+----
+
+* Ensure that :func:`time.sleep(0) <time.sleep>` does not degrade over time


I suppose that you're referring to performance:

Suggested change

* Ensure that :func:`time.sleep(0) <time.sleep>` does not degrade over time

* Ensure that :func:`time.sleep(0) <time.sleep>` performance does not degrade over time

Depending on the resolution of this PR (namely, whether we eventually use select()), I'll amend the NEWS so I won't commit that suggestion for now.

vstinner · 2025-01-06T10:25:18Z

Modules/timemodule.c

+    err = ret;
+#elif defined(HAVE_NANOSLEEP)
+    struct timespec zero = {0, 0};
+    ret = nanosleep(&zero, NULL);


Is it worth it to have 3 code paths for sleep(0)? Can't we always use select()?

We had a long discussion on this matter but long story short:

select() is implementation sensitive but there were some semantical and implementation concerns (see gh-125997: ensure that time.sleep(0) is not delayed on non-Windows platforms #128274 (comment)).

using select() would be much faster and if you still want to go that road (which I did originally), then I personally don't mind.

The implementations of `time.sleep(0)` on Windows and POSIX are now handled by the same function. Previously, `time.sleep(0)` on Windows was handled by `pysleep()` while on POSIX platforms, it was handled by `pysleep_zero_posix()`.

Modules/timemodule.c

vstinner · 2025-01-07T10:34:30Z

Modules/timemodule.c

+    Py_BEGIN_ALLOW_THREADS
+#ifdef HAVE_CLOCK_NANOSLEEP
+    struct timespec zero = {0, 0};
+    ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &zero, NULL);


I don't think that TIMER_ABSTIME is appropriate here:

Suggested change

ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &zero, NULL);

ret = clock_nanosleep(CLOCK_MONOTONIC, 0, &zero, NULL);

I first thought about:

If flags is TIMER_ABSTIME, then t is interpreted as an absolute
time as measured by the clock, clockid. If t is less than or
equal to the current value of the clock, then clock_nanosleep()
returns immediately without suspending the calling thread.

Without this flag, time.sleep(0) takes 50us. Otherwise it takes 2us. So, now I'm more and more inclined to actually revert it back to a select. Because otherwise, it's as if I'm not doing anything (and just skip the call to clock_nanosleep; I mean I already know that the condition on t is verified so it's as if I have no call)

I'm leaving for a few days but I'm still struggling to convince myself between the use of select or not. The reason why clock_nanosleep() is slowed down is because we don't pass a zero time struct.

Ok, I was wrong. See #128274 (comment).

Modules/timemodule.c

picnixz · 2025-01-07T15:00:01Z

Ok, so I'm actually back at the starting point. First of all, let's take a step back and understand the issue at hand: time.sleep(0) takes too long when using clock_nanosleep or nanosleep. The reason? clock_nanosleep(0) takes roughly 1-2us; nanosleep(0) takes roughly 50us but select(0, NULL, NULL, NULL, &zero) takes 150ns.

Second, I was wrong with TIME_ABSTIMER combined with a 0 struct. The correct call would indeed be with flags=0 and {0, 0} OR with TIME_ABSTIMER and a timestamp value. Using TIME_ABSTIMER + 0 is the wrong call. Now, whether we use 0 or TIME_ABSTIMER with the correct timestamp, a single call to clock_nanosleep takes 50us. So, whatever choice we do, we'll always be slower.

Now, let's move back to what time.sleep(0) semantically does. What we want is to suspend the calling thread and release the GIL as well (otherwise it doesn't make sense). In C++, std::this_thread::sleep_for(0, 0) is implemented as follows (cleaning up the relevant bits):

template<typename _Rep, typename _Period> 
inline void
sleep_for(const chrono::duration<_Rep, _Period>& __rtime)
{
	if (__rtime <= __rtime.zero())
		return;
	...
}

So, as we can see, std::this_thread::sleep_for(std::chrono::seconds(0)) would essentially be a no-op. So in C++, time.sleep(0) would actually be a simple return. Do we want to emulate that behaviour?

If yes, what about Windows' dedicate sleep(0) which relinquishes the CPU? Should we also change it? I don't think so. Previously, non-Windows platforms implemented time.sleep using select. So why not switch back to that? or even better, align with C++ so that time.sleep(0) does nothing on Linux (maybe POSIX as well?)

Or, if this is really too much of a hassle, we can just... drop that PR (I won't mind) and simply improve the docs saying that time.sleep(0) may not be useful for polling and select.poll.poll(0) should be preferred.

Using `clock_nanosleep()` would always take more than 50 us.

picnixz · 2025-01-07T15:06:28Z

For now, I've decided to revert to the old 3.11 behaviour. If you prefer me to change the docs, I can also do it. I'm leaving for a few days so I won't be able to see the comments.

vstinner · 2025-01-07T16:18:43Z

If I understood correctly the issue, the purpose of this change is to make time.sleep(0) faster on Linux.

The problem is that always using select() to implement time.sleep(0) would make this call less efficient on FreeBSD: #125997 (comment)

picnixz · 2025-01-12T11:31:49Z

Considering that we would be penalizing FreeBSD (and that using time.sleep(0) is usually not the correct semantic choice), I will close this PR and make two fresh PRs. One for adding tests and one for improving the documentation (namely, recommend using select.poll.poll(0) or os.sched_yield() depending on the caller needs; using time.sleep(0) is (in C++ at least) roughly equivalent to a pass (return immediately)).

I learned a lot during this but I'm still not comfortable with reverting it back to the previous behaviour. ISTM that the previous behaviour is probably "wrong" (namely, I'm not really sure we should even use select() as an alternative; we should probably use sleep() instead)

picnixz added 3 commits December 26, 2024 13:30

Ensure that time.sleep(0) does not accumulate delays.

3754e9e

On non-Windows platforms, this reverts the usage of `clock_nanosleep` and `nanosleep` introduced by 85a4748 and 7834ff2 respectively, falling back to a `select(2)` alternative instead.

update globals

3c0c4e6

blurb

c18f15f

picnixz requested review from pganssle, abalkin and ericsnowcurrently as code owners December 26, 2024 12:32

bedevere-app bot mentioned this pull request Dec 26, 2024

time.sleep(0) is slower on Python 3.11 than on Python 3.10 #125997

Open

bedevere-app bot added the awaiting review label Dec 26, 2024

picnixz removed the request for review from ericsnowcurrently December 26, 2024 12:32

fix Windows compilation

8d270f2

picnixz requested review from hauntsaninja and vstinner December 26, 2024 12:47

rruuaanng reviewed Dec 26, 2024

View reviewed changes

Modules/timemodule.c Show resolved Hide resolved

picnixz commented Dec 26, 2024

View reviewed changes

Modules/timemodule.c Outdated Show resolved Hide resolved

ZeroIntensity reviewed Dec 26, 2024

View reviewed changes

Modules/timemodule.c Outdated Show resolved Hide resolved

Modules/timemodule.c Outdated Show resolved Hide resolved

picnixz added 2 commits December 27, 2024 12:04

use a fresh timeval variable for select(2) portability

2996937

ensure that errno=0 when calling pysleep[zero_posix]

80de853

picnixz force-pushed the perf/time/zero-125997 branch from ff959e1 to 80de853 Compare December 27, 2024 11:05

picnixz requested a review from ZeroIntensity December 27, 2024 11:05

removing assert(errno == 0)

c7aa428

Due to how `OSError` are raised from `errno`, we do not clear `errno` afterwards. If we catch `OSError`, then we still have an errno set, and if we call `time.sleep()` just after, we may have `errno != 0` (but we know we handled it so it's fine).

add errors assertions

f15436e

picnixz added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Dec 28, 2024

fix typo

23b4740

picnixz added 4 commits January 3, 2025 09:54

optimize time.sleep(0) path

40155ff

update docs

153b326

update tests

7e351c9

revert un-necessary docs updates

285c54a

picnixz marked this pull request as ready for review January 3, 2025 09:02

bedevere-app bot added the awaiting review label Jan 3, 2025

vstinner reviewed Jan 6, 2025

View reviewed changes

picnixz added 2 commits January 6, 2025 11:33

remove performance tests

434da31

vstinner reviewed Jan 6, 2025

View reviewed changes

Modules/timemodule.c Show resolved Hide resolved

Modules/timemodule.c Outdated Show resolved Hide resolved

Modules/timemodule.c Outdated Show resolved Hide resolved

address Victor's review

40d56f1

picnixz requested a review from vstinner January 7, 2025 09:47

picnixz added 2 commits January 7, 2025 10:48

update blurb

c4ce9cc

simplify pysleep

54b6dde

vstinner reviewed Jan 7, 2025

View reviewed changes

add newline for readability

405e473

revert to a select() alternative

b803045

Using `clock_nanosleep()` would always take more than 50 us.

picnixz closed this Jan 12, 2025

This was referenced Jan 12, 2025

gh-125997: improve tests for time.sleep() #128751

Open

gh-125997: suggest efficient alternatives for time.sleep(0) #128752

Open

picnixz deleted the perf/time/zero-125997 branch January 12, 2025 12:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-125997: ensure that `time.sleep(0)` is not delayed on non-Windows platforms #128274

gh-125997: ensure that `time.sleep(0)` is not delayed on non-Windows platforms #128274

picnixz commented Dec 26, 2024 •

edited

Loading

rruuaanng commented Dec 27, 2024

picnixz commented Dec 27, 2024

rruuaanng commented Dec 27, 2024

picnixz commented Dec 27, 2024 •

edited

Loading

ZeroIntensity commented Dec 27, 2024

picnixz commented Dec 27, 2024

ZeroIntensity commented Dec 27, 2024

charles-cooper commented Jan 2, 2025 •

edited

Loading

picnixz commented Jan 2, 2025

picnixz commented Jan 3, 2025 •

edited

Loading

vstinner Jan 6, 2025

picnixz Jan 6, 2025

vstinner Jan 6, 2025

picnixz Jan 6, 2025

vstinner Jan 7, 2025

picnixz Jan 7, 2025 •

edited

Loading

picnixz Jan 7, 2025

picnixz Jan 7, 2025

picnixz commented Jan 7, 2025

picnixz commented Jan 7, 2025

vstinner commented Jan 7, 2025

picnixz commented Jan 12, 2025 •

edited

Loading

	* Ensure that :func:`time.sleep(0) <time.sleep>` does not degrade over time
	* Ensure that :func:`time.sleep(0) <time.sleep>` performance does not degrade over time

	ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &zero, NULL);
	ret = clock_nanosleep(CLOCK_MONOTONIC, 0, &zero, NULL);

gh-125997: ensure that time.sleep(0) is not delayed on non-Windows platforms #128274

gh-125997: ensure that time.sleep(0) is not delayed on non-Windows platforms #128274

Conversation

picnixz commented Dec 26, 2024 • edited Loading

rruuaanng commented Dec 27, 2024

picnixz commented Dec 27, 2024

rruuaanng commented Dec 27, 2024

picnixz commented Dec 27, 2024 • edited Loading

ZeroIntensity commented Dec 27, 2024

picnixz commented Dec 27, 2024

ZeroIntensity commented Dec 27, 2024

charles-cooper commented Jan 2, 2025 • edited Loading

picnixz commented Jan 2, 2025

picnixz commented Jan 3, 2025 • edited Loading

vstinner Jan 6, 2025

Choose a reason for hiding this comment

picnixz Jan 6, 2025

Choose a reason for hiding this comment

vstinner Jan 6, 2025

Choose a reason for hiding this comment

picnixz Jan 6, 2025

Choose a reason for hiding this comment

vstinner Jan 7, 2025

Choose a reason for hiding this comment

picnixz Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

picnixz Jan 7, 2025

Choose a reason for hiding this comment

picnixz Jan 7, 2025

Choose a reason for hiding this comment

picnixz commented Jan 7, 2025

picnixz commented Jan 7, 2025

vstinner commented Jan 7, 2025

picnixz commented Jan 12, 2025 • edited Loading

gh-125997: ensure that `time.sleep(0)` is not delayed on non-Windows platforms #128274

gh-125997: ensure that `time.sleep(0)` is not delayed on non-Windows platforms #128274

picnixz commented Dec 26, 2024 •

edited

Loading

picnixz commented Dec 27, 2024 •

edited

Loading

charles-cooper commented Jan 2, 2025 •

edited

Loading

picnixz commented Jan 3, 2025 •

edited

Loading

picnixz Jan 7, 2025 •

edited

Loading

picnixz commented Jan 12, 2025 •

edited

Loading