Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement io_uring support for FileStream #51985

Open
adamsitnik opened this issue Apr 28, 2021 · 21 comments
Open

Implement io_uring support for FileStream #51985

adamsitnik opened this issue Apr 28, 2021 · 21 comments
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

We have recently invested a lot of time in rewriting FileStream on Windows. We have kept io_uring in mind and after recent refactoring, it should be now much easier to implement the support:

  • we have introduced a new internal abstraction called FileStreamStrategy. It's more or less FileStream API.
  • FileStream can choose the strategy at runtime. In the case of Linux, it could detect the kernel version and just use the new strategy for newer kernels (5.5+). It means that the day our customers update their kernel version, .NET could start using io_uring without a .NET update.
  • Entire buffering logic has been moved to a new strategy called BufferedFileStreamStrategy which can be used as a wrapper over another strategy. It means that new strategies (like IoUringStrategy) don't need to worry about buffering at all
    internal static FileStreamStrategy EnableBufferingIfNeeded(WindowsFileStreamStrategy strategy, int bufferSize)
    => bufferSize == 1 ? strategy : new BufferedFileStreamStrategy(strategy, bufferSize);
  • We can use the existing Unix strategy for sync file IO, so the new IoUringStrategy would only need to implement ReadAsync and WriteAsync support.

We (owners of System.IO) have a lot of other high-priority things on our schedule for .NET 6 (like full symbolic links support) and since most of our customers are not using the latest Linux kernels, we are most probably won't be able to implement it on our own for .NET 6. But we would love to provide any help necessary (code reviews, testing) for a contributor that would be willing to implement it. Having said that, I am marking this issue as "up-for-grabs".

If we won't find a contributor for .NET 6, we are going to include this in .NET 7 planning and deliver it in .NET 7.

@adamsitnik adamsitnik added area-System.IO tenet-performance Performance related issue help wanted [up-for-grabs] Good issue for external contributors labels Apr 28, 2021
@adamsitnik adamsitnik added this to the Future milestone Apr 28, 2021
@ghost
Copy link

ghost commented Apr 28, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

We have recently invested a lot of time in rewriting FileStream on Windows. We have kept io_uring in mind and after recent refactoring, it should be now much easier to implement the support:

  • we have introduced a new internal abstraction called FileStreamStrategy. It's more or less FileStream API.
  • FileStream can choose the strategy at runtime. In the case of Linux, it could detect the kernel version and just use the new strategy for newer kernels (5.5+). It means that the day our customers update their kernel version, .NET could start using io_uring without a .NET update.
  • Entire buffering logic has been moved to a new strategy called BufferedFileStreamStrategy which can be used as a wrapper over another strategy. It means that new strategies (like IoUringStrategy) don't need to worry about buffering at all
    internal static FileStreamStrategy EnableBufferingIfNeeded(WindowsFileStreamStrategy strategy, int bufferSize)
    => bufferSize == 1 ? strategy : new BufferedFileStreamStrategy(strategy, bufferSize);
  • We can use the existing Unix strategy for sync file IO, so the new IoUringStrategy would only need to implement ReadAsync and WriteAsync support.

We (owners of System.IO) have a lot of other high-priority things on our schedule for .NET 6 (like full symbolic links support) and since most of our customers are not using the latest Linux kernels, we are most probably won't be able to implement it on our own for .NET 6. But we would love to provide any help necessary (code reviews, testing) for a contributor that would be willing to implement it. Having said that, I am marking this issue as "up-for-grabs".

If we won't find a contributor for .NET 6, we are going to include this in .NET 7 planning and deliver it in .NET 7.

Author: adamsitnik
Assignees: -
Labels:

area-System.IO, tenet-performance, up-for-grabs

Milestone: Future

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 28, 2021
@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Apr 28, 2021
@adamsitnik
Copy link
Member Author

@tmds @damageboy @benaadams would any of you be interested?

@tmds
Copy link
Member

tmds commented Apr 29, 2021

Many of the io_uring benchmarks are performed on a single thread that needs no synchronization. We won't be able to achieve the gains measured there because we need to synchronize and hop between threads. That will definitely cost us something.

Functionally, using io_uring allows to cancel the the on-going operations. This is not supported with the current sync-on-ThreadPool implementation. So this is a functional gain.

I'll let you know if I find time to work on this. I'd need your, and others, help to optimize the thread/synchronization stuff.

@richlander
Copy link
Member

richlander commented May 17, 2021

it could detect the kernel version and just use the new strategy for newer kernels (5.5+)

I thought in our last conversation we decided that we should gate this feature on 5.10 since the support between 5.5 and 5.7 is patchy. It seems like 5.10 would be great. As context, .NET 6 container images use Debian 11 by default and the second most popular are Alpine, which for .NET 6 will be 3.13+.

Here's what I found on kernel versions.

Interesting context: https://news.ycombinator.com/item?id=27382299

@omariom
Copy link
Contributor

omariom commented May 17, 2021

WSL is already 5.4.72-microsoft-standard-WSL2

@ayousuf23
Copy link
Contributor

@adamsitnik What is io_uring? Is it a new algorithm for IO?

@stephentoub
Copy link
Member

stephentoub commented Jun 22, 2021

What is io_uring? Is it a new algorithm for IO?

https://en.wikipedia.org/wiki/Io_uring

@davidvmckay
Copy link

davidvmckay commented Sep 2, 2021

@adamsitnik What is io_uring? Is it a new algorithm for IO?

io_uring is a pretty sweet, modern io api in Linux kernel 5.1+
https://kernel.dk/io_uring.pdf

Uses producer-consumer ring buffers to achieve lock-free asynchrony with low-latency, high throughput, and minimal memory copies, like other notable recent architectures:
https://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf
https://youtu.be/Qho1QNbXBso?t=1267

@dmitriyse
Copy link

Windows also should receive IO Rings API soon:
https://windows-internals.com/i-o-rings-when-one-i-o-operation-is-not-enough

@elachlan
Copy link
Contributor

https://www.phoronix.com/scan.php?page=news_item&px=8M-IOPS-Per-Core-Linux

An engineer from Facebook is pushing the performance quite aggressively for IO_uring in Linux. So I imagine there would be significant performance gains to be had if utilized in dotnet for Linux.

@GSPP
Copy link

GSPP commented Nov 15, 2021

I wonder what it would take to achieve performance gains with io_uring on non-benchmark workloads. If IOs are issued just like before except using a new call mechanism, I do not see why this would be much faster.

Achieving batching benefits would take new APIs that are not currently available with FileStream. Registering buffers might be difficult to achieve without application cooperation. I understand that io_uring supports polling which helps with super-low latency devices. That can't be done by default so it must be opt-in.

On the web, there are various reports by people who couldn't reproduce performance gains. This is further evidence that the gains might accrue only when the application is structured suitably.

So maybe it takes new, specialized APIs for applications to harness this fully. Since Windows appears to have similar mechanisms now, there could be a common abstraction for both.

Low latency IO has been a trend for the last couple of years. We have SSDs now that are insanely fast. Networks have become much lower latency as well (e.g. RDMA). So maybe there's value in addressing such devices with a new API.

@tmds
Copy link
Member

tmds commented Nov 15, 2021

io_uring gains come from being able to batch operations, and to batch retrieving result. The benchmarks/apps that benefit from it most will be written so they inherently batch.

All existing .NET APIs are not batching. For example, they deal with each Socket separately. Making them use io_uring means adding an additional layer that causes the operations to be batched. The cost of that layer will be significant compared to the benchmarks/apps that inherently batch.

@elachlan
Copy link
Contributor

elachlan commented Nov 18, 2021

Another improvement:
~500K IOPS/core improvement or around a 5~6% efficiency upgrade
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.17-Will-Continue-IO

I imagine a whole new set of APIs might be needed, or maybe FileStream uses it under the hood in high load scenarios. I think the idea is that if you implement the base somewhere, then it will slowly be added to the rest of .NET and optimized.

@pr8x
Copy link

pr8x commented Mar 16, 2022

Is there any news on this topic? I think FileStream (at least without substantial refactorings) doesn't seem to be the right API for this as it does not support batching.

@ayende
Copy link
Contributor

ayende commented Mar 17, 2022

Isn't that what RandomAccess suppose to give?

@adamsitnik
Copy link
Member Author

Is there any news on this topic?

We are not planning to add io_uring support for .NET 7. The main reason for that is currently in most common scenarios we would observe a perf regression. Currently in io_uring the producer and consumer (the thread that adds and removes work items to/from the ring) needs to be the same thread. It just does not work well with our current Thread Pool model.

Isn't that what RandomAccess suppose to give?

@ayende is right, RandomAccess supports passing multiple buffers:

https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#scatter-gather-io

@AlexeiScherbakov
Copy link

May be FileStream is bad place for io_uring? Queue rings can be implemented in software level by System.Threading.Tasks.Dataflow primitives, and I think that io_uring place in .NET must be in separate async only primitive.

@mgamache
Copy link

mgamache commented Jan 5, 2025

Any movement on adding io_uring?

@Scooletz
Copy link

https://github.com/davidtos/JUring 👀

@adamsitnik
Copy link
Member Author

@Scooletz thanks for sharing!

Based on https://github.com/davidtos/JUring?tab=readme-ov-file#thread-safety:

JURing is not thread safe, from what I read about io_uring there should only be one instance per thread. I want to copy this behaviour to not deviate too much from how io_works. The completion and submission queue used by io_uring don't support multiple threads writing to them at the same time. Preparing operations or waiting for completions should be done by a single thread. Processing the results/buffers is thread safe.

It seems that what I wrote in #51985 (comment) is still true.

Other things has changed, as Windows has introduced a similar API (IoRing). But there is still no such API for macOS and we very rarely introduce non-cross platform APIs. Moreover, when looking at JURing API it's clear that the target audience would be "superusers" and it would be possible to run into issues when using it wrong.

@benaadams
Copy link
Member

benaadams commented Jan 15, 2025

It seems that what I wrote in #51985 (comment) is still true.

Doesn't need to be same thread (isn't bonded), just a single thread at a time (not threadsafe).

So could do something based on ConcurrentQueue etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests