From 224715f95b51fd14a538640f93d3eb6afed6e1a8 Mon Sep 17 00:00:00 2001 From: Jacob Kiesel Date: Mon, 23 Jan 2023 11:35:20 -0700 Subject: [PATCH] Move description of message binary format into new document --- MESSAGE_BINARY_FORMAT.md | 92 +++++++++++++++++++++++++++++++++++++++ README.md | 93 +--------------------------------------- 2 files changed, 93 insertions(+), 92 deletions(-) create mode 100644 MESSAGE_BINARY_FORMAT.md diff --git a/MESSAGE_BINARY_FORMAT.md b/MESSAGE_BINARY_FORMAT.md new file mode 100644 index 0000000..f6c9541 --- /dev/null +++ b/MESSAGE_BINARY_FORMAT.md @@ -0,0 +1,92 @@ +### Introduction + +The main offering of this crate is a consistent and known representation of Rust types. As such, the format is +considered to be part of our stable API, and changing the format requires a major version number bump. To aid you +in debugging, the current version of that format is documented here. + +### High-level overview + +***Connection Initial Description*** + +Version 1 of the protocol did not send this description. Version 2 is the first version that sends a startup description. +The first 8 bytes sent on the channel are the little endian protocol version number. If the reader is not compatible with the message version provided, it must +terminate immediately. Immediately following the version, the sender will describe the features of the message stream it is about to send. Those features are as follows. + +**Version 1** + +Version 1 did not send an initial description. All optional features are disabled in version 1. + +**Version 2** +- `name`: checksum_enabled, `size`: 1 byte, `possible values`: 2 or 3, `notes`: 2 indicates checksums will be sent, 3 indicates checksums will not be sent. + +***Message stream*** + +After the initial description, the byte stream is split up into messages. Every message begins with a `length` value. After `length` bytes have +been read, a new message can begin immediately afterward. This `length` value is the entirety of the header of a +message. If checksums are enabled, the 8 bytes immediately following the message are the checksum of the message. This checksum is determined by hashing +the bytes of the message using SipHash 2-4. If checksums are disabled, instead the next message begins immediately. The bytes from the message are then +deserialized into a Rust type via [`bincode`](https://github.com/bincode-org/bincode), using the following configuration. + +```rust,ignore +bincode::DefaultOptions::new() + .with_limit(size_limit) + .with_little_endian() + .with_varint_encoding() + .reject_trailing_bytes() +``` + +### Length encoding + +The length is encoded using a variably sized integer encoding scheme. To understand this scheme, first we need a few constant values. + +```ignore +u16_marker; decimal: 252, hex: FC +u32_marker; decimal: 253, hex: FD +u64_marker; decimal: 254, hex: FE +zst_marker; decimal: 255, hex: FF +stream_end; decimal: 0, hex: 00 +``` + +Any length less than `u16_marker` and greater than 0 is encoded as a single byte whose value is the length. +A length of zero is encoded with the `zst_marker`. The stream is ended with the `stream_end` value. When this is +read the peer is expected to close the connection. + +`async-io-typed` always uses little-endian. The user data being sent may contain values that are not +little-endian, but `async-io-typed` itself always uses little-endian. + +If the first byte is `u16_marker`, then the length is 16 bits wide, and encoded in the following 2 bytes. Once +those 2 bytes are read, the message begins. `u32_marker` and `u64_marker` are used in a similar way, each of +those being 4 bytes, and 8 bytes respectively. + +### Examples + + +Length 12 +```ignore +0C +``` + +Length 0 +```ignore +FF +``` + +Length 252 (First byte is u16_marker) +```ignore +FC, FC, 00 +``` + +Length 253 (First byte is u16_marker) +```ignore +FC, FD, 00 +``` + +Length 65,536 (aka 2^16) (First byte is u32_marker) +```ignore +FD, 00, 00, 01, 00 +``` + +Length 4,294,967,296 (aka 2^32) (First byte is u64_marker) +```ignore +FE, 00, 00, 00, 00, 01, 00, 00, 00 +``` diff --git a/README.md b/README.md index 8126d1d..7e7bf68 100644 --- a/README.md +++ b/README.md @@ -31,95 +31,4 @@ it will help. Consider using protobufs or JSON if Rust adoption is a blocker. ## Binary format -### Introduction - -The main offering of this crate is a consistent and known representation of Rust types. As such, the format is -considered to be part of our stable API, and changing the format requires a major version number bump. To aid you -in debugging, that format is documented here. - -### High-level overview - -***Connection Initial Description*** - -Version 1 of the protocol did not send this description. Version 2 is the first version that sends a startup description. -The first 8 bytes sent on the channel are the little endian protocol version number. If the reader is not compatible with the message version provided, it must -terminate immediately. Immediately following the version, the sender will describe the features of the message stream it is about to send. Those features are as follows. - -**Version 1** - -Version 1 did not send an initial description. All optional features are disabled in version 1. - -**Version 2** -- `name`: checksum_enabled, `size`: 1 byte, `possible values`: 2 or 3, `notes`: 2 indicates checksums will be sent, 3 indicates checksums will not be sent. - -***Message stream*** - -After the initial description, the byte stream is split up into messages. Every message begins with a `length` value. After `length` bytes have -been read, a new message can begin immediately afterward. This `length` value is the entirety of the header of a -message. If checksums are enabled, the 8 bytes immediately following the message are the checksum of the message. This checksum is determined by hashing -the bytes of the message using SipHash 2-4. If checksums are disabled, instead the next message begins immediately. The bytes from the message are then -deserialized into a Rust type via [`bincode`](https://github.com/bincode-org/bincode), using the following configuration. - -```rust,ignore -bincode::DefaultOptions::new() - .with_limit(size_limit) - .with_little_endian() - .with_varint_encoding() - .reject_trailing_bytes() -``` - -### Length encoding - -The length is encoded using a variably sized integer encoding scheme. To understand this scheme, first we need a few constant values. - -```ignore -u16_marker; decimal: 252, hex: FC -u32_marker; decimal: 253, hex: FD -u64_marker; decimal: 254, hex: FE -zst_marker; decimal: 255, hex: FF -stream_end; decimal: 0, hex: 00 -``` - -Any length less than `u16_marker` and greater than 0 is encoded as a single byte whose value is the length. -A length of zero is encoded with the `zst_marker`. The stream is ended with the `stream_end` value. When this is -read the peer is expected to close the connection. - -`async-io-typed` always uses little-endian. The user data being sent may contain values that are not -little-endian, but `async-io-typed` itself always uses little-endian. - -If the first byte is `u16_marker`, then the length is 16 bits wide, and encoded in the following 2 bytes. Once -those 2 bytes are read, the message begins. `u32_marker` and `u64_marker` are used in a similar way, each of -those being 4 bytes, and 8 bytes respectively. - -### Examples - - -Length 12 -```ignore -0C -``` - -Length 0 -```ignore -FF -``` - -Length 252 (First byte is u16_marker) -```ignore -FC, FC, 00 -``` - -Length 253 (First byte is u16_marker) -```ignore -FC, FD, 00 -``` - -Length 65,536 (aka 2^16) (First byte is u32_marker) -```ignore -FD, 00, 00, 01, 00 -``` - -Length 4,294,967,296 (aka 2^32) (First byte is u64_marker) -```ignore -FE, 00, 00, 00, 00, 01, 00, 00, 00 -``` +Details on the binary format used by this crate can be found in [the binary format specification](https://github.com/Xaeroxe/async-io-typed/blob/main/MESSAGE_BINARY_FORMAT.md). \ No newline at end of file