Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use_file: switch to futex on Linux and to nanosleep on other targets #490

Merged
merged 1 commit into from
Sep 27, 2024

Conversation

newpavlov
Copy link
Member

@newpavlov newpavlov commented Jun 21, 2024

All non-Linux targets only open file in the critical section, so we probably can remove all synchronization and replace it with a sleep-based wait loop.

On Linux initialization thread may spend a lot of time in the critical section because of wait_until_rng_ready. On entropy-starved (and especially virtualized) systems we may spend tens of seconds in it, so we need a proper synchronization between initializing and waiting threads. Luckily, the futex API is a great fit in this case and allows us to efficiently handle the synchronization.

@josephlr
Copy link
Member

josephlr commented Jun 21, 2024

I don't think we should split how we do synchronization between the different targets. It adds a bunch of complexity for little gain. No contention occurs in the common case.

Also, see #453 (comment). If we want something better than pthread_mutex_{lock/unlock}, we should either use std::sync::Mutex or just not use a lock at all (and use libc::poll to handle sleeping/waking the threads).

@newpavlov
Copy link
Member Author

All other use_file users are relatively minor targets, so I think it makes sense to have a more specialized version of this code for Linux. Especially considering the differences between Linux and the rest (futex availability, /dev/random polling, #[inline(never)] gating). Moving out the Linux stuff makes use_file much cleaner, without any target-specific cfgs.

Not only std mutex adds unnecessary global state, but its implementation has a bunch of machinery which is not needed in our case (e.g. poisoning). Splitting the Linux code also could help with #401, if we'll decide to implement it.

IIUC implementation for the remaining targets uses pthread mutex under the hood, so with this PR we can continue to use pthread mutex, since it was primarily problematic on Linux and not other targets. Considering that non-Linux targets only open file in the critical section, I think we can even remove locking completely and replace it with a simple sleep-based loop similar to what I drafted in #478, without any libc::poll complications.

@newpavlov
Copy link
Member Author

Hm, if we are fine with a sleep-based loop for non-Linux targets, I guess we could eliminate the split by using sleep-based implementation of the wait function.

@josephlr
Copy link
Member

josephlr commented Jun 21, 2024

Not only std mutex adds unnecessary global state, but its implementation has a bunch of machinery which is not needed in our case (e.g. poisoning).

The main benefit of std's Mutex is that it's well tested/reviewed and not a custom thing we need to maintain. It doesn't really matter that Mutex supports more functionality than we need, it's already implemented as part of Rust.

@newpavlov
Copy link
Member Author

newpavlov commented Jun 21, 2024

The use of futex wait/wake operations in this PR is very straightforward, it's essentially a textbook-level example, so I don't think the new code requires any particular maintenance.

I removed the code split and dependency on pthread mutex. What do you think about the new code?

@newpavlov newpavlov changed the title Use futex for Linux file fallback use_file: switch to futex on Linux and to nanosleep on other targets Jun 21, 2024
@josephlr
Copy link
Member

The futex wait/wake operations are very straightforward, they are essentially textbook-level examples, so I don't think they require any particular maintenance.

I removed the code split and removed dependency on pthread mutex. What do you think about the new code?

I think this increases the complexity of the implementation without a concrete benefit. This implementation is longer and harder for me to understand than the pthread based code. An implementation using std's Mutex would have nearly identical performance characteristics and would be much shorter/simpler.

If you don't want to use stds Mutex, what would be your thoughts on the second proposal in #453 (comment) to just not have any locking at all?

@josephlr
Copy link
Member

Hm, if we are fine with a sleep-based loop for non-Linux targets, I guess we could eliminate the split by using sleep-based implementation of the wait function.

I think its reasonable to get rid of locking for initializing the file descriptor. It's not clear to me if a sleep based approach is better or worse than just opening the file (potentially multiple times) and then closing the file if the thread "loses" the race.

@newpavlov
Copy link
Member Author

newpavlov commented Jun 21, 2024

If we are to close eyes on the sync module for now, are you fine with the rest of the code?

If you don't want to use stds Mutex, what would be your thoughts on the second proposal in #453 (comment) to just not have any locking at all?

It's essentially what I did in this PR. On non-Linux targets there is no locking, but on Linux we need some kind of locking because wait_until_rng_ready may block for a significant amount of time (it's one of the reasons why I thought that the code split can be a good idea).

libc::poll-based synchronization IMO is more convoluted and hacky than the futex-based code. LazyFd will result in unnecessary opening operations, which may hit ulimit. In the worst case scenario, every thread will open /dev/random to poll it, and after it's ready, they all will rush to open /dev/urandom.

@josephlr
Copy link
Member

libc::poll-based synchronization IMO is more convoluted and hacky than the futex-based code. LazyFd will result in unnecessary opening operations, which may hit ulimit. In the worst case scenario, every thread will open /dev/random to poll it, and after it's ready, they all will rush to open /dev/urandom.

I mean we would be using libc::poll regardless, that part would be unchanged. As for potentially opening multiple files, I don't think it would happen in practice. Even if a bunch of threads are poll-ing /dev/random they will all be polling the same fd (in my proposal). The only way multiple fds get opened is if two threads race on libc::open itself, and even in that scenario, the extra FD gets closed immediately.

@josephlr
Copy link
Member

it's one of the reasons why I thought that the code split can be a good idea

Looking at this, I actually think that splitting the code is fine, as we can share common abstractions (if any) between use_file.rs and linux_with_fallback.rs. My main concern is with how the locking/sleeping/waiting is implemented.

@josephlr
Copy link
Member

If we are to close eyes on the sync module for now, are you fine with the rest of the code?

I would say the contents/complexity of the sync module are the main reason I don't prefer this approach. I agree that if we just were opening shared FDs, we could solve this in multiple nice ways without using std.

@newpavlov
Copy link
Member Author

newpavlov commented Jun 21, 2024

I mean we would be using libc::poll regardless, that part would be unchanged.

Ah, I somewhat misunderstood your idea. I guess it could work, but I think it will be more code and (to my taste) less straightforward. Plus, as you note yourself, this approach will always result in 2 file descriptors being continuously opened instead of 1.

we can share common abstractions (if any) between use_file.rs and linux_with_fallback.rs

Amount of common abstractions depends on whether we use pthread (or std) mutex or the sleep-based loop for non-Linux targets. In the former case the amount is small enough to warrant complete independence. But in the latter case I think it makes more sense to keep code in one module like it's done in the current version.

I would say the contents/complexity of the sync module are the main reason I don't prefer this approach.

I guess perceived complexity depends on how familiar reader is with futexes. Maybe I can amend this by adding comments explaining what exactly wake and wait do?

@josephlr
Copy link
Member

I guess perceived complexity depends on how familiar reader is with futexes. Maybe I can amend this by adding comments explaining what exactly wake and wait do?

Looking at the man page for futex I found this:

To reiterate, bare futexes are not intended as an easy-to-use abstraction for end-users. (There is no wrapper function for this system call in glibc.)

This makes me think that futexes (whike very cool) are not really designed for general use, but rather as a way to allow things like std to have fast mutexes.

@newpavlov
Copy link
Member Author

newpavlov commented Jun 21, 2024

This makes me think that futexes (whike very cool) are not really designed for general use, but rather as a way to allow things like std to have fast mutexes.

Futex is certainly a very low-level API (with certain esoteric corners) which is usually used as a fundamental building block for higher-level synchronization structures. But it's not like it does not get directly used to implement custom synchronization primitives in production code (e.g. some people use it to implement an io-uring-like IPC API on top of shared memory). IMO it's an extremely good fit solution for our problem.

@newpavlov
Copy link
Member Author

@josephlr
I would like to move forward with this PR. Could you take a look at it again?

@newpavlov
Copy link
Member Author

I will merge this for now to make it easier to work on other stuff. We can revert or change it before v0.3 release, since I will be waiting for other people's review before cutting it.

@newpavlov newpavlov merged commit fd2de2f into master Sep 27, 2024
52 checks passed
@newpavlov newpavlov deleted the futex branch September 27, 2024 16:53
@briansmith
Copy link
Contributor

I will merge this for now to make it easier to work on other stuff. We can revert or change it before v0.3 release, since I will be waiting for other people's review before cutting it.

Why was it urgent to merge this? This is quite a big change to merge without review and I think the lack of review is mostly because nobody agrees we should do this. I think we should figure out a way to make progress on the other things that this was blocking without having this change merged.

@newpavlov
Copy link
Member Author

Neither you, nor Joe were active on GitHub for quite some time. I wanted to work on the PR backlog and had some time for that. I don't want to deal with potential merge conflicts in multiple PRs while a work on other features. As we discussed with Joe, the merges are not final and will be reviewed and maybe changed before v0.3 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants