Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-125997: ensure that time.sleep(0) is not delayed on non-Windows platforms #128274

Closed
wants to merge 26 commits into from

Conversation

picnixz
Copy link
Member

@picnixz picnixz commented Dec 26, 2024

This is a suggestion for fixing the issue when the timeout is 0. For other timeouts, this does not change the current behaviour (I'm tempted removing the do-while loop as we should never have an issue, but I'm not familar with the monotonic C implementation so I left it as is).

On non-Windows platforms, this reverts the usage of `clock_nanosleep`
and `nanosleep` introduced by 85a4748 and 7834ff2 respectively,
falling back to a `select(2)` alternative instead.
Modules/timemodule.c Outdated Show resolved Hide resolved
Modules/timemodule.c Outdated Show resolved Hide resolved
Modules/timemodule.c Outdated Show resolved Hide resolved
@picnixz picnixz force-pushed the perf/time/zero-125997 branch from ff959e1 to 80de853 Compare December 27, 2024 11:05
@rruuaanng
Copy link
Contributor

@picnixz Unfortunately, we should seriously review :(. It seems that an assertion failed somewhere.

Assertion failed: (errno == 0), function pysleep, file timemodule.c, line 2220.
Fatal Python error: Aborted

Thread 0x0000000173bbb000 (most recent call first):
  File "/Users/admin/actions-runner/_work/cpython/cpython/Lib/subprocess.py", line 1918 in _execute_child
...

@picnixz
Copy link
Member Author

picnixz commented Dec 27, 2024

Ah I think I know what happens. When we raise an OSError from errno we don't clear the errno. So if you chain calls, you need to just ignore them. I'm removing the assertion cc @ZeroIntensity

Due to how `OSError` are raised from `errno`, we do not clear `errno`
afterwards. If we catch `OSError`, then we still have an errno set,
and if we call `time.sleep()` just after, we may have `errno != 0`
(but we know we handled it so it's fine).
@rruuaanng
Copy link
Contributor

If pytime is running on multiple cores, I recommend using _LIBC_REENTRANT, which will make your errno independent in each thread.

raw defined

#  if !defined _LIBC || defined _LIBC_REENTRANT
/* When using threads, errno is a per-thread value.  */
#   define errno (*__errno_location ())
#  endif
# endif /* !__ASSEMBLER__ */
#endif /* _ERRNO_H */

@picnixz
Copy link
Member Author

picnixz commented Dec 27, 2024

This wouldn't solve the issue (also because _LIBC_REENTRANT is a private macro and something you supply when configuring libc). By the way, errno is thread-safe (see https://linux.die.net/man/3/errno). The issue is simply that we can raise an OSError from errno, which doesn't clear errno, and do something else afterwards. I will keep the current behaviour as it was done beforehand.

@ZeroIntensity
Copy link
Member

TIL! Should we assert !PyErr_Occurred() instead?

@picnixz
Copy link
Member Author

picnixz commented Dec 27, 2024

TIL! Should we assert !PyErr_Occurred() instead?

Not sure. It's not really an issue and the function is internal and only for time.sleep(). It's only used once (it's just that time.sleep() calls this function). Though it wouldn't hurt (if this assertion fails, then there is a separate issue)

@ZeroIntensity
Copy link
Member

I think we should. There's no real cost to adding it, and it would be helpful for debugging in the rare case that something goes wrong.

@picnixz picnixz added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Dec 28, 2024
@charles-cooper
Copy link

charles-cooper commented Jan 2, 2025

I've locally selected the select() alternative, but considering your comment on the GIL, I may either release the GIL or drop this idea. What I can however keep from my PR (that I've still not updated) is the fact that emulating time.sleep(0) can be achieved using select.poll().poll(0) if one does not want to relinquish the CPU or via os.sched_yield() if one wants to relinquish the CPU.

i have not looked into this too deeply but my understanding is the poll() will also relinquish the CPU since it is a syscall but not necessarily give up the time slice (e.g. if there are no threads with higher priority)

In addition, I didn't check yet but I'm wondering whether calling clock_nanosleep() without calling PyTime_Monotonic would actually solve the issue of sleeping using a TIMER_ABSTIME flag. The reason is that clock_nanosleep() should not suspend the calling thread in this case:

If flags is TIMER_ABSTIME, then t is interpreted as an absolute
time as measured by the clock, clockid. If t is less than or
equal to the current value of the clock, then clock_nanosleep()
returns immediately without suspending the calling thread.

this looks like a good idea, might be the cleanest - just choose TIMER_ABSTIME in the case that the argument to time.sleep() is <=0.

as a user, i would just generally expect that sleep() issues a syscall (i.e. relinquish enough control to the kernel and cpython runtime that they can unblock any I/O or other operations that need to be scheduled) but don't really care exactly which one.

@picnixz
Copy link
Member Author

picnixz commented Jan 2, 2025

I'll commit something tomorrow or the day after using that approach. It would also be nice to avoid the (small) overhead of using PyTime_Monotonic().

@picnixz
Copy link
Member Author

picnixz commented Jan 3, 2025

While we're still above the select() alternative, we're now a bit faster and the delays are less pronounced: Mean +- std dev: 2.09 us +- 0.24 us.

For comparison, the current implementation performances 25x worse: Mean +- std dev: 54.8 us +- 0.8 us, so we're definitely seeing an improvement using the optimized version.

I'll update the comments, docs and tests and push those changes.

FTR, the numbers above are in a DEBUG build, but this doesn't change much as on an optimized build, current implementation takes roughly the same amount of time.

@picnixz picnixz marked this pull request as ready for review January 3, 2025 09:02
Lib/test/test_time.py Outdated Show resolved Hide resolved
Modules/timemodule.c Outdated Show resolved Hide resolved
time
----

* Ensure that :func:`time.sleep(0) <time.sleep>` does not degrade over time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you're referring to performance:

Suggested change
* Ensure that :func:`time.sleep(0) <time.sleep>` does not degrade over time
* Ensure that :func:`time.sleep(0) <time.sleep>` performance does not degrade over time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the resolution of this PR (namely, whether we eventually use select()), I'll amend the NEWS so I won't commit that suggestion for now.

err = ret;
#elif defined(HAVE_NANOSLEEP)
struct timespec zero = {0, 0};
ret = nanosleep(&zero, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth it to have 3 code paths for sleep(0)? Can't we always use select()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long discussion on this matter but long story short:

The implementations of `time.sleep(0)` on Windows and POSIX are
now handled by the same function. Previously, `time.sleep(0)` on
Windows was handled by `pysleep()` while on POSIX platforms, it
was handled by `pysleep_zero_posix()`.
Modules/timemodule.c Show resolved Hide resolved
Modules/timemodule.c Outdated Show resolved Hide resolved
Modules/timemodule.c Outdated Show resolved Hide resolved
@picnixz picnixz requested a review from vstinner January 7, 2025 09:47
Py_BEGIN_ALLOW_THREADS
#ifdef HAVE_CLOCK_NANOSLEEP
struct timespec zero = {0, 0};
ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &zero, NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that TIMER_ABSTIME is appropriate here:

Suggested change
ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &zero, NULL);
ret = clock_nanosleep(CLOCK_MONOTONIC, 0, &zero, NULL);

Copy link
Member Author

@picnixz picnixz Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first thought about:

If flags is TIMER_ABSTIME, then t is interpreted as an absolute
time as measured by the clock, clockid. If t is less than or
equal to the current value of the clock, then clock_nanosleep()
returns immediately without suspending the calling thread.

Without this flag, time.sleep(0) takes 50us. Otherwise it takes 2us. So, now I'm more and more inclined to actually revert it back to a select. Because otherwise, it's as if I'm not doing anything (and just skip the call to clock_nanosleep; I mean I already know that the condition on t is verified so it's as if I have no call)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving for a few days but I'm still struggling to convince myself between the use of select or not. The reason why clock_nanosleep() is slowed down is because we don't pass a zero time struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was wrong. See #128274 (comment).

Modules/timemodule.c Show resolved Hide resolved
@picnixz
Copy link
Member Author

picnixz commented Jan 7, 2025

Ok, so I'm actually back at the starting point. First of all, let's take a step back and understand the issue at hand: time.sleep(0) takes too long when using clock_nanosleep or nanosleep. The reason? clock_nanosleep(0) takes roughly 1-2us; nanosleep(0) takes roughly 50us but select(0, NULL, NULL, NULL, &zero) takes 150ns.

Second, I was wrong with TIME_ABSTIMER combined with a 0 struct. The correct call would indeed be with flags=0 and {0, 0} OR with TIME_ABSTIMER and a timestamp value. Using TIME_ABSTIMER + 0 is the wrong call. Now, whether we use 0 or TIME_ABSTIMER with the correct timestamp, a single call to clock_nanosleep takes 50us. So, whatever choice we do, we'll always be slower.

Now, let's move back to what time.sleep(0) semantically does. What we want is to suspend the calling thread and release the GIL as well (otherwise it doesn't make sense). In C++, std::this_thread::sleep_for(0, 0) is implemented as follows (cleaning up the relevant bits):

template<typename _Rep, typename _Period> 
inline void
sleep_for(const chrono::duration<_Rep, _Period>& __rtime)
{
	if (__rtime <= __rtime.zero())
		return;
	...
}

So, as we can see, std::this_thread::sleep_for(std::chrono::seconds(0)) would essentially be a no-op. So in C++, time.sleep(0) would actually be a simple return. Do we want to emulate that behaviour?

If yes, what about Windows' dedicate sleep(0) which relinquishes the CPU? Should we also change it? I don't think so. Previously, non-Windows platforms implemented time.sleep using select. So why not switch back to that? or even better, align with C++ so that time.sleep(0) does nothing on Linux (maybe POSIX as well?)

Or, if this is really too much of a hassle, we can just... drop that PR (I won't mind) and simply improve the docs saying that time.sleep(0) may not be useful for polling and select.poll.poll(0) should be preferred.

Using `clock_nanosleep()` would always take more than 50 us.
@picnixz
Copy link
Member Author

picnixz commented Jan 7, 2025

For now, I've decided to revert to the old 3.11 behaviour. If you prefer me to change the docs, I can also do it. I'm leaving for a few days so I won't be able to see the comments.

@vstinner
Copy link
Member

vstinner commented Jan 7, 2025

If I understood correctly the issue, the purpose of this change is to make time.sleep(0) faster on Linux.

The problem is that always using select() to implement time.sleep(0) would make this call less efficient on FreeBSD: #125997 (comment)

@picnixz
Copy link
Member Author

picnixz commented Jan 12, 2025

Considering that we would be penalizing FreeBSD (and that using time.sleep(0) is usually not the correct semantic choice), I will close this PR and make two fresh PRs. One for adding tests and one for improving the documentation (namely, recommend using select.poll.poll(0) or os.sched_yield() depending on the caller needs; using time.sleep(0) is (in C++ at least) roughly equivalent to a pass (return immediately)).

I learned a lot during this but I'm still not comfortable with reverting it back to the previous behaviour. ISTM that the previous behaviour is probably "wrong" (namely, I'm not really sure we should even use select() as an alternative; we should probably use sleep() instead)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants