Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(de)compression: reduce memory allocation to improve performance #521

Merged
merged 1 commit into from
Sep 23, 2024

Conversation

magurotuna
Copy link
Contributor

@magurotuna magurotuna commented Sep 22, 2024

Motivation

Currently, every time WrapBody::poll_frame is called, new instance of BytesMut is created with the default capacity, which is effectively 64 bytes. This ends up with a lot of memory allocation in certain situations, making the throughput significantly worse.

#520 describes that a manual implementation of decompression logic using async-compression gets the performance issue with tower_http::decompression to be resolved. To identify what is making this difference, I captured a flamegraph. Here's the result:

flamegraph

As annotated, tower_http::decompression has more memory allocation which is the reason why the performance is degraded.

Solution

To optimize memory allocation, WrapBody now gets BytesMut as its field, with initial capacity of 4096 bytes. This buffer will be reused as much as possible across multiple poll_frame calls, and only when its capacity becomes 0, new allocation of another 4096 bytes is performed.

The capacity of 4096 bytes is taken from the default capacity of the internal buffer used in tokio_util::io::ReaderStream: https://docs.rs/tokio-util/0.7.12/src/tokio_util/io/reader_stream.rs.html#8

Performance Improvement

With this optimization, the flamegraph is changed as follows, where memory allocation is significantly reduced and optimized.

optimized flamegraph

Also, the throughput gets 10x better when we measure it using Deno's fetch implementation that internally uses hyper and tower_http::decompresson.
This graph shows how long each Deno version takes to handle 2k requests. The rightmost one named v2.0.0-rc.4-tower-http-patched is the one with this patch applied. The leftmost one, v1.45.2, does not use tower_http::decompression.

performance

For more information on how this result was measured, please refer to https://github.com/magurotuna/deno_fetch_decompression_throughput

Fixes: #520

Currently, every time `WrapBody::poll_frame` is called, new instance of
`BytesMut` is created with the default capacity, which is effectively
64 bytes. This ends up with a lot of memory allocation in certain
situations, making the throughput significantly worse.

To optimize memory allocation, `WrapBody` now gets `BytesMut` as its
field, with initial capacity of 4096 bytes. This buffer will be reused
as much as possible across multiple `poll_frame` calls, and only when
its capacity becomes 0, new allocation of another 4096 bytes is
performed.

Fixes: tower-rs#520
Copy link
Collaborator

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phenomenal write-up, thanks!

@seanmonstar seanmonstar merged commit 9fdf0eb into tower-rs:main Sep 23, 2024
11 checks passed
@magurotuna magurotuna deleted the perf-wrap-body-allocation branch September 23, 2024 14:28
github-merge-queue bot referenced this pull request in apollographql/subgraph-template-rust-async-graphql Sep 23, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tower-http](https://redirect.github.com/tower-rs/tower-http) |
dependencies | patch | `0.6.0` -> `0.6.1` |

---

### Release Notes

<details>
<summary>tower-rs/tower-http (tower-http)</summary>

###
[`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1):
v0.6.1

[Compare
Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1)

#### Fixed

- **decompression:** reuse scratch buffer to significantly reduce
allocations and improve performance ([#&#8203;521])

[#&#8203;521]: https://redirect.github.com/tower-rs/tower-http/pull/521

#### New Contributors

- [@&#8203;magurotuna](https://redirect.github.com/magurotuna) made
their first contribution in
[https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/apollographql/subgraph-template-rust-async-graphql).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
szinn referenced this pull request in szinn/s3-cdn Sep 24, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tower-http](https://redirect.github.com/tower-rs/tower-http) |
dependencies | patch | `0.6.0` -> `0.6.1` |

---

### Release Notes

<details>
<summary>tower-rs/tower-http (tower-http)</summary>

###
[`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1):
v0.6.1

[Compare
Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1)

#### Fixed

- **decompression:** reuse scratch buffer to significantly reduce
allocations and improve performance ([#&#8203;521])

[#&#8203;521]: https://redirect.github.com/tower-rs/tower-http/pull/521

#### New Contributors

- [@&#8203;magurotuna](https://redirect.github.com/magurotuna) made
their first contribution in
[https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521)

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NC4yIiwidXBkYXRlZEluVmVyIjoiMzguOTQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9wYXRjaCJdfQ==-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
szinn referenced this pull request in szinn/rust-arch Sep 24, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tower-http](https://redirect.github.com/tower-rs/tower-http) |
dependencies | patch | `0.6.0` -> `0.6.1` |

---

### Release Notes

<details>
<summary>tower-rs/tower-http (tower-http)</summary>

###
[`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1):
v0.6.1

[Compare
Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1)

#### Fixed

- **decompression:** reuse scratch buffer to significantly reduce
allocations and improve performance ([#&#8203;521])

[#&#8203;521]: https://redirect.github.com/tower-rs/tower-http/pull/521

#### New Contributors

- [@&#8203;magurotuna](https://redirect.github.com/magurotuna) made
their first contribution in
[https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521)

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC45NC4yIiwidXBkYXRlZEluVmVyIjoiMzguOTQuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsidHlwZS9wYXRjaCJdfQ==-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
its-danny referenced this pull request in mist-id/mist Sep 24, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tower-http](https://redirect.github.com/tower-rs/tower-http) |
dependencies | patch | `0.6.0` -> `0.6.1` |

---

### Release Notes

<details>
<summary>tower-rs/tower-http (tower-http)</summary>

###
[`v0.6.1`](https://redirect.github.com/tower-rs/tower-http/releases/tag/tower-http-0.6.1):
v0.6.1

[Compare
Source](https://redirect.github.com/tower-rs/tower-http/compare/tower-http-0.6.0...tower-http-0.6.1)

#### Fixed

- **decompression:** reuse scratch buffer to significantly reduce
allocations and improve performance ([#&#8203;521])

[#&#8203;521]: https://redirect.github.com/tower-rs/tower-http/pull/521

#### New Contributors

- [@&#8203;magurotuna](https://redirect.github.com/magurotuna) made
their first contribution in
[https://github.com/tower-rs/tower-http/pull/521](https://redirect.github.com/tower-rs/tower-http/pull/521)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/mist-id/mist).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

decompression: worse throughput when using tower_http::decompression than manual impl with async-compression
2 participants