Use single serialization format for all block state data #1434

Fraser999 · 2024-08-30T11:06:16Z

Format to use will be decided once #1433 is completed and discussed.

Every type of data to be encoded should have a separate type in a new storage module, and potentially we should have an enum of all these types, with an instance of the enum being what is actually encoded and stored to disk (although this is not agreed upon at the moment).

┆Issue Number: ENG-768

The text was updated successfully, but these errors were encountered:

Fraser999 · 2024-09-04T09:55:20Z

Decision made in favour of Borsh after discussion of this benchmarking repo.

Fraser999 · 2024-09-24T10:17:05Z

After discussion with @SuperFluffy we've agreed to an approach whereby individual components should define their own storage enum - similar to the top level StoredValue enum, but defining only types required by that component. The top-level StoredValue enum can then be unflattened to hold only these component-specific enums.

This seems like a good compromise between the desire to have components be in control of their own stored data, while also ensuring every type put to DB is uniquely identifiable by a discriminant (provided via the Borsh-encoding of the enums).

We also noted that having a tight coupling between DB keys and the stored types they represent would be desirable, but out of scope for this batch of work.

…G-768) (#1492) ## Summary This PR primarily changes the encoding format of all data being written to storage to Borsh. ## Background We currently have a variety of different encoding formats, and this can be confusing, sub-optimal (in terms of storage space consumed and serialization/deserialization performance) and potentially problematic as e.g. JSON-encoding leaves a lot of flexibility in how the actual serialized data can look. This PR is part one of three which aim to improve the performance and quality of the storage component. As such, the APIs of the various `StateReadExt` and `StateWriteExt` extension traits were updated slightly in preparation for the upcoming changes. In broad terms, for getters this meant having ref parameters rather than value ones (even for copyable types like `[u8; 32]` this is significantly more performant), and for "putters", parameters which are used for DB keys are refs, while the DB value parameters become values, since in the next PR these values will be added to a cache. ## Changes - Added a new `storage` module. This will ultimately contain our own equivalents of the `cnidarium` types, but for now consists only of a collection of submodules for types currently being written to storage. There is a top-level enum `StoredValue` which becomes the only type being written to storage by our own code. - To accommodate for converting between the storage types and the corresponding domain types defined in `astria-core` and `astria-merkle`, some of these have been provided with constructors named `unchecked_from_parts`. This allows the type to be constructed from a trusted source like our own DB, skipping any validation steps which might otherwise be done. - Updated all `StateReadExt` and `StateWriteExt` traits to use the new `StoredValue`, which internally uses Borsh encoding. - Updated the APIs of all these extension traits as described above. This change resulted in a slew of similar updates to avoid needless copying `[u8; 32]` and `[u8; 20]` arrays, and that has unfortunately made the PR larger than expected. - A few core types which currently derive or implement `serde` traits have had that removed, since it only existed to support JSON-encoding the type for storing. In one case it was for JSON-encoding to a debug log line, but I replaced that with a `Display` impl. ## Testing - Existing unit tests of the `StateReadExt` and `StateWriteExt` traits already had almost full coverage. Where coverage was missing, I added round trip tests. ## Breaking Changelist - The on-disk format of stored data has changed. ## Related Issues Closes #1434.

Fraser999 self-assigned this Aug 30, 2024

Fraser999 mentioned this issue Sep 11, 2024

chore(sequencer)!: exclusively use Borsh encoding for stored data (ENG-768) #1492

Merged

Fraser999 closed this as completed in #1492 Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use single serialization format for all block state data #1434

Use single serialization format for all block state data #1434

Fraser999 commented Aug 30, 2024 •

edited

Loading

Fraser999 commented Sep 4, 2024

Fraser999 commented Sep 24, 2024

Use single serialization format for all block state data #1434

Use single serialization format for all block state data #1434

Comments

Fraser999 commented Aug 30, 2024 • edited Loading

Fraser999 commented Sep 4, 2024

Fraser999 commented Sep 24, 2024

Fraser999 commented Aug 30, 2024 •

edited

Loading