Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use single serialization format for all block state data #1434

Closed
Fraser999 opened this issue Aug 30, 2024 · 2 comments · Fixed by #1492
Closed

Use single serialization format for all block state data #1434

Fraser999 opened this issue Aug 30, 2024 · 2 comments · Fixed by #1492
Assignees

Comments

@Fraser999
Copy link
Contributor

Fraser999 commented Aug 30, 2024

Format to use will be decided once #1433 is completed and discussed.

Every type of data to be encoded should have a separate type in a new storage module, and potentially we should have an enum of all these types, with an instance of the enum being what is actually encoded and stored to disk (although this is not agreed upon at the moment).

┆Issue Number: ENG-768

@Fraser999 Fraser999 self-assigned this Aug 30, 2024
@Fraser999
Copy link
Contributor Author

Decision made in favour of Borsh after discussion of this benchmarking repo.

@Fraser999
Copy link
Contributor Author

After discussion with @SuperFluffy we've agreed to an approach whereby individual components should define their own storage enum - similar to the top level StoredValue enum, but defining only types required by that component. The top-level StoredValue enum can then be unflattened to hold only these component-specific enums.

This seems like a good compromise between the desire to have components be in control of their own stored data, while also ensuring every type put to DB is uniquely identifiable by a discriminant (provided via the Borsh-encoding of the enums).

We also noted that having a tight coupling between DB keys and the stored types they represent would be desirable, but out of scope for this batch of work.

github-merge-queue bot pushed a commit that referenced this issue Oct 1, 2024
…G-768) (#1492)

## Summary
This PR primarily changes the encoding format of all data being written
to storage to Borsh.

## Background
We currently have a variety of different encoding formats, and this can
be confusing, sub-optimal (in terms of storage space consumed and
serialization/deserialization performance) and potentially problematic
as e.g. JSON-encoding leaves a lot of flexibility in how the actual
serialized data can look.

This PR is part one of three which aim to improve the performance and
quality of the storage component. As such, the APIs of the various
`StateReadExt` and `StateWriteExt` extension traits were updated
slightly in preparation for the upcoming changes. In broad terms, for
getters this meant having ref parameters rather than value ones (even
for copyable types like `[u8; 32]` this is significantly more
performant), and for "putters", parameters which are used for DB keys
are refs, while the DB value parameters become values, since in the next
PR these values will be added to a cache.

## Changes
- Added a new `storage` module. This will ultimately contain our own
equivalents of the `cnidarium` types, but for now consists only of a
collection of submodules for types currently being written to storage.
There is a top-level enum `StoredValue` which becomes the only type
being written to storage by our own code.
- To accommodate for converting between the storage types and the
corresponding domain types defined in `astria-core` and `astria-merkle`,
some of these have been provided with constructors named
`unchecked_from_parts`. This allows the type to be constructed from a
trusted source like our own DB, skipping any validation steps which
might otherwise be done.
- Updated all `StateReadExt` and `StateWriteExt` traits to use the new
`StoredValue`, which internally uses Borsh encoding.
- Updated the APIs of all these extension traits as described above.
This change resulted in a slew of similar updates to avoid needless
copying `[u8; 32]` and `[u8; 20]` arrays, and that has unfortunately
made the PR larger than expected.
- A few core types which currently derive or implement `serde` traits
have had that removed, since it only existed to support JSON-encoding
the type for storing. In one case it was for JSON-encoding to a debug
log line, but I replaced that with a `Display` impl.

## Testing
- Existing unit tests of the `StateReadExt` and `StateWriteExt` traits
already had almost full coverage. Where coverage was missing, I added
round trip tests.

## Breaking Changelist
- The on-disk format of stored data has changed.

## Related Issues
Closes #1434.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant