Deadlocks caused by the root task calling other PDs #13
Replies: 11 comments 9 replies
-
QNXWhile searching for solutions to deadlocks in microkernels, I came across the QNX manual which had a few interesting things to say. IPC The manual has a section on IPC, which is relevant to us since their IPC primitives are similar to seL4.
This is a simple and effective way to avoid deadlocks, but what if we need to send messages down the tree? The manual suggests two solutions. The first is non-blocking event messages:
The second solution is that a lower-level thread initiates a send to a higher-level thread, and the higher-level thread can reply at its own convenience.
The second solution is interesting, but ultimately not possible for us, because our server PDs cannot "report for work" while they are serving client PDs. Even if we did have some dedicated thread of a server PD that waits for the root task to give it work, still the root task would have to be prepared to handle the result asynchronously. Signals QNX implements signals using its asynchronous event-delivery mechanisms. Signals are targeted at a particular thread, or an entire process. Threads can block certain signals. If a signal is directed at an entire process, it is delivered to the first thread that does not have the signal blocked - or, if all threads have the signal blocked, then the signal is queued. I think that our server PDs are analogous to a process with a single thread, where the "extract model" or "free object" signals should be blocked while the PD is serving a request. If the root task sends these signals while they are blocked, then the signal should be queued for the server to handle ASAP. Takeaways We need some principled approach to prevent deadlocks, like the one that the QNX manual describes. We need to ensure that the root task never sends a blocking request to a PD, and for this we need a form of message queue. |
Beta Was this translation helpful? Give feedback.
-
FUSE (File System in Userspace)FUSE is a kernel module and userspace library that facilitate running a file system in userspace on top of a Linux kernel. Applications can request regular file system operations on the mounted FUSE file system, which first go to the kernel, then the kernel forwards the request to the userspace filesystem. This was of interest since, like our setup, it involves a privileged PD making requests of a less-privileged PD. Kernel -> FS communication The kernel module and FUSE fs maintain the queues by reading/writing the '/dev/fuse' file. There are several queues with different priorities, and the filesystem handles them accordingly. When requests are complete, the filesystem writes replies to another queue. The kernel module will read the replies and unblock any waiting application. The format of the requests is pretty straightforward: all requests have a header structure including an opcode, followed by an opcode-specific data struct, plus up to 2 filenames. If the FUSE fs needs to make a request of the kernel, eg. to notify user about events on a polled file, then it writes a "notification" to '/dev/fuse'. I'm not certain of this, but I think the kernel replies synchronously with an error code. Takeaways References |
Beta Was this translation helpful? Give feedback.
-
BarrelfishUpcalls Takeaways |
Beta Was this translation helpful? Give feedback.
-
I think it's important to clearly articulate what the purpose of the root task is in the grand scheme of things -- depending on that certain aspects are possible, some others are not. It seems that the root task is acting a bit as a monitor / trusted entity in the system. Given that, having it blocked on other non-trusted entities is probably not so great unless those entities become part of the trusted domain. One thing to avoid here is that entities can create dependencies from the root task to their own. Even if we were to make event-based or multi-threaded, another thing to avoid here is that resources are clobbered, i.e., through the allocation of continuations or thread contexts that are never cleaned up because the domain doesn't reply to the request. Another aspect here is whether all IPC has RPC semantics, or whether this is event driven. |
Beta Was this translation helpful? Give feedback.
-
Forming a Game PlanI'm going to lay out a plan for the two routes I am considering. Option 1: Async Message QueueDesign:
Steps:
Option 2: Uphill-onlyDesign:
Steps:
|
Beta Was this translation helpful? Give feedback.
-
More on mutexesIn both of the scenarios from the last comment, I mentioned that we would need a shared mutex to prevent a particular deadlock. I was considering this alternative: I imagine we could have a "polling PD", trusted but separate from the root task. It needs to know which PDs having a pending message / work from the RT, which could be done via some shared structure with the RT. It will do NBSends to those PDs until they have received the message / work. I don't know if this is a terrible idea because it would overload the sel4 kernel with NBSend calls. The message may be dropped most of the time. This actually sounds worse. I feel better about a mutex in comparison. I think mutex would also be relatively easy to implement using seL4 notifications, but I will confirm it. |
Beta Was this translation helpful? Give feedback.
-
I really like option 2 (it doesn't mean that it is better) coz it is easier to follow, IMO. Again, that is just my opinion. However, I am not sure why a deadlock is inevitable in scenario 2. We can talk about it when you are in. Do not think we need to resolve this right away. Re option one, I think we need a mutex anytime we have tail and head pointers, right? So mutex is necessary for basic operations, leaving deadlock aside. |
Beta Was this translation helpful? Give feedback.
-
Update on mutex: What I don't understand, is why seL4_libs has a mutex that uses both a notification, and a variable (code here). Why did they choose to use both, and am I missing something? |
Beta Was this translation helpful? Give feedback.
-
That is a classic optimization. So makes sense. Let's looks at the code to
make sure that, that is indeed what's happening.
Best,
Sid Agrawal
sid-agrawal.ca
…On Tue, Jun 25, 2024 at 9:28 AM Arya Stevinson ***@***.***> wrote:
One thought about this: Potentially the reason for seL4_libs using both a
notification and a variable, is when you only need the mutex within one
address space, perhaps the test-and-set is a cheaper operation than
Notification Signal/Wait if nobody is holding/waiting on the mutex. Then we
can fall back to the Notification if necessary.
—
Reply to this email directly, view it on GitHub
<#13 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNYFSG5MJQHGYQBV24XTI3ZJGLDRAVCNFSM6AAAAABJJDEUYCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TQNZTG4ZTG>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
The Issue
The current state of implementation allows the root task to make requests to other PDs (specifically, PDs that manage a resource space) for two reasons:
If the root task messages a resource server while the resource server is busy (like handling some client request), then we will get a deadlock if the resource server also needs to request something from the root task.
Potential Solutions
This thread is to track investigation into possible solutions. Some potential avenues we could consider:
Next Steps
I will be looking into other microkernels / similar architectures (where a "more trusted" part of the system is making a call to a "less-trusted" part of the system) to evaluate the potential solutions.
Beta Was this translation helpful? Give feedback.
All reactions