-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Brian Shih
authored and
Brian Shih
committed
Oct 9, 2023
1 parent
07808fb
commit a77dce9
Showing
9 changed files
with
133 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# API |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Prerequisites - Building Blocks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Prerequisites - Building Blocks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Implementation Details |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# What is Asynchronous I/O? | ||
|
||
In this phase, we will add I/O to our runtime. | ||
|
||
A simple approach to I/O would be to just wait for the I/O operation to complete. But such an approach, called **synchronous I/O** or **blocking I/O** would block the single-threaded executor from performing any other tasks concurrently. | ||
|
||
What we want instead is **asynchronous I/O**. In this approach, performing I/O won’t block the calling thread. This allows the executor to run other tasks and return to the original task once the I/O operation completes. | ||
|
||
Before we implement asynchronous I/O, we need to first look at two things: how to turn an I/O operation to non-blocking and `io_uring`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Io_uring | ||
|
||
On this page, I’ll provide a surface-level explanation of how `io_uring` works. If you want a more in-depth explanation, check out [this tutorial](https://unixism.net/loti/what_is_io_uring.html) or [this article](https://developers.redhat.com/articles/2023/04/12/why-you-should-use-iouring-network-io#:~:text=io_uring is an asynchronous I,O requests to the kernel). | ||
|
||
As mentioned, `io_uring` manages file descriptors for the users and lets them know when one or more of them are ready. | ||
|
||
Each `io_uring` instance is composed of two ring buffers - the submission queue and the completion queue. | ||
|
||
To register interest in a file descriptor, you add an SQE to the tail of the submission queue. Adding to the submission queue doesn’t automatically send the requests to the kernel, you need to submit it via the `io_uring_enter` system call. `Io_uring` supports batching by allowing you to add multiple SQEs to the ring before submitting. | ||
|
||
The kernel processes the submitted entries and adds completion queue events (CQEs) to the completion queue when it is ready. While the order of the CQEs might not match the order of the SQEs, there will be one CQE for each SQE, which you can identify by providing user data. | ||
|
||
The user can then check the CQE to see if there are any completed I/O operations. | ||
|
||
### Using io_uring for TcpListener | ||
|
||
Let’s look at how we can use `IoUring` to manage the `accept` operation for a `TcpListener`. We will be using the `iou` crate, a library built on top of `liburing`, to create and interact with `io_uring` instances. | ||
|
||
```rust | ||
let l = std::net::TcpListener::bind("127.0.0.1:8080").unwrap(); | ||
l.set_nonblocking(true).unwrap(); | ||
let mut ring = iou::IoUring::new(2).unwrap(); | ||
|
||
unsafe { | ||
let mut sqe = ring.prepare_sqe().expect("failed to get sqe"); | ||
sqe.prep_poll_add(l.as_raw_fd(), iou::sqe::PollFlags::POLLIN); | ||
sqe.set_user_data(0xDEADBEEF); | ||
ring.submit_sqes().unwrap(); | ||
} | ||
l.accept(); | ||
let cqe = ring.wait_for_cqe().unwrap(); | ||
assert_eq!(cqe.user_data(), 0xDEADBEEF); | ||
``` | ||
|
||
In this example, we first create a `[TcpListener](<https://doc.rust-lang.org/stable/std/net/struct.TcpListener.html>)` and set it to non-blocking. Next, we create an `io_uring` instance. We then register interest in the socket’s file descriptor by making a call to `prep_poll_add` (a wrapper around Linux’s [io_uring_prep_poll_add](https://man7.org/linux/man-pages/man3/io_uring_prep_poll_add.3.html) call). This adds a `SQE` entry to the submission queue which will trigger a CQE to be posted [when there is data to be read](https://github.com/nix-rust/nix/blob/e7c877abf73f7f74e358f260683b70ce46db13b0/src/poll.rs#L127). | ||
|
||
We then call `accept` to accept any incoming TCP connections. Finally, we call `wait_for_cqe`, which returns the next CQE, blocking the thread until one is ready if necessary. If we wanted to avoid blocking the thread in this example, we could’ve called `peek_for_cqe` which peeks for any completed CQE without blocking. | ||
|
||
### Efficiently Checking the CQE | ||
|
||
You might be wondering - if we potentially need to call `peek_for_cqe()` repeatedly until it is ready, how is this different from calling `listener.accept()` repeatedly? | ||
|
||
The difference is that `accept` is a system call while `peek_for_cqe`, which calls `io_uring_peek_batch_cqe` under the hood, is not a system call. This is due to the unique property of `io_uring` such that the completion ring buffer is shared between the kernel and the user space. This allows you to efficiently check the status of completed I/O operations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Nonblocking Mode | ||
|
||
In Rust, by default, many I/O operations, such as reading a file, are blocking. For example, in the code snippet below, the `TcpListener::accept` call will block the calling thread until a new TCP connection is established. | ||
|
||
```rust | ||
let listener = TcpListener::bind("127.0.0.1:8080").unwrap(); | ||
listener.accept(); | ||
``` | ||
|
||
### Nonblocking I/O | ||
|
||
The first step towards asynchronous I/O is turning a blocking I/O operation into a non-blocking one. | ||
|
||
In Linux, it is possible to do nonblocking I/O on sockets and files by setting the `O_NONBLOCK` flag on the file descriptors. | ||
|
||
Here’s how you can set the file descriptor for a socket to be non-blocking: | ||
|
||
```rust | ||
let listener = std::net::TcpListener::bind("127.0.0.1:8080").unwrap(); | ||
let raw_fd = listener.as_raw_fd(); | ||
fcntl(raw_fd, FcntlArg::F_SETFL(OFlag::O_NONBLOCK)) | ||
``` | ||
|
||
Setting the file descriptor for the `TcpListener` to nonblocking means that the next I/O operation would immediately return. To check if the operation is complete, you have to manually `poll` the file descriptor. | ||
|
||
Rust’s std library has helper methods such as `Socket::set_blocking` to set a file descriptor to be nonblocking: | ||
|
||
```rust | ||
let l = std::net::TcpListener::bind("127.0.0.1:8080").unwrap(); | ||
l.set_nonblocking(true).unwrap(); | ||
``` | ||
|
||
### Polling | ||
|
||
As mentioned above, after setting a socket’s file descriptor to be non-blocking, you have to manually poll the file descriptor to check if the I/O operation is completed. Under non-blocking mode, the `TcpListener::Accept` method returns `Ok` if the I/O operation is successful or an error with kind `io::ErrorKind::WouldBlock` is returned. | ||
|
||
In the following example, we `loop` until the I/O operation is ready by repeatedly calling `accept`: | ||
|
||
```rust | ||
let l = std::net::TcpListener::bind("127.0.0.1:8080").unwrap(); | ||
l.set_nonblocking(true).unwrap(); | ||
|
||
loop { | ||
// the accept call | ||
let res = l.accept(); | ||
match res { | ||
Ok((stream, _)) => { | ||
handle_connection(stream); | ||
break; | ||
} | ||
Err(err) => if err.kind() == io::ErrorKind::WouldBlock {}, | ||
} | ||
} | ||
``` | ||
|
||
While this works, repeatedly calling `accept` in a loop is not ideal. Each call to `TcpListener::accept` is an expensive call to the kernel. | ||
|
||
This is where system calls like [select](http://man7.org/linux/man-pages/man2/select.2.html), [poll,](http://man7.org/linux/man-pages/man2/poll.2.html) [epoll](http://man7.org/linux/man-pages/man7/epoll.7.html), [aio](https://man7.org/linux/man-pages/man7/aio.7.html), [io_uring](https://man.archlinux.org/man/io_uring.7.en) come in. These calls let you register interest for file descriptors and let you know when one or more of them are ready. This reduces the need for constant polling and makes better use of system resources. | ||
|
||
Glommio uses `io_uring`. One of the things that make `io_uring` stand out compared to other system calls is that it presents a uniform interface for both sockets and files. This is a huge improvement from system calls like `epoll` that doesn’t support files while `aio` only works with a subset of files (linus-aio only supports `O_DIRECT` files). In the next page, we take a quick glance at how `io_uring` works. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters