Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CBOR Encoding is compatible with stricter network decode mode #6879

Merged

Conversation

jordanschalm
Copy link
Member

This PR modifies CBOR encoding of flow.Chunk to make it compatible with the stricter decoding rules used for the networking layer codec. It also moves the definition of the networking layer's decode mode into the same file where other encoding modes are defined.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 41.13%. Comparing base (f847f03) to head (31082a0).

Additional details and impacted files
@@                   Coverage Diff                    @@
##           feature/efm-recovery    #6879      +/-   ##
========================================================
+ Coverage                 41.12%   41.13%   +0.01%     
========================================================
  Files                      2123     2123              
  Lines                    187163   187163              
========================================================
+ Hits                      76968    76993      +25     
+ Misses                   103745   103723      -22     
+ Partials                   6450     6447       -3     
Flag Coverage Δ
unittests 41.13% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AlexHentschel
Copy link
Member

AlexHentschel commented Jan 15, 2025

If it's not too much trouble, I would be great if you take a brief look at the DefaultSerializer in module/executiondatasync/execution_data/serializer.go. I would appreciate your insights on the matter.

Based on my understanding of the code, the DefaultSerializer defines its own encoding ... which seems to be originally intended for the Execution Node to internally store its execution data. My understanding is that this encoding is not strict (?) If that is the case, I think this needs to be highlighted as a warning in the documentation, which I would very much appreciate if we could do that in this PR:

  • In codec.go line 35 I would suggest to rename DecMode to UnsafeDecMode and add the following warning:
    // UnsafeDecMode is is a permissive mode for creating a new cbor Decoder.
    //
    // CAUTION: this encoding should only be used for encoding/decoding data within a node.
    // If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
    // to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
    // encoding, which would not be noticed by the recipient because the garbage data is dropped 
    // at the decoding step - yet, it consumes the recipient's networking bandwidth. 
    var UnsafeDecMode, _ = cbor.DecOptions{}.DecMode()
    I hope the change surface is small enough for it to be doable as part of this PR. If renaming it is too much work, lets at least include the disclaimer.
  • If my assessment is correct, I think the same warning should be included in the NewCodec constructor (same file):
    // NewCodec returns a new cbor Codec with the provided EncMode and DecMode.
    // If either is nil, the default cbor EncMode/DecMode will be used.
    // 
    // CAUTION: this encoding should only be used for encoding/decoding data within a node.
    // If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
    // to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
    // encoding, which would not be noticed by the recipient because the garbage data is dropped 
    // at the decoding step - yet, it consumes the recipient's networking bandwidth. 
    func NewCodec(opts ...Option) *Codec {
        ⋮
  • and also the DefaultSerializer in file serializer.go would deserve the same warning:
    // DefaultSerializer is the default implementation for an Execution Data serializer.
    // It is configured to use cbor encoding with LZ4 compression.
    // 
    // CAUTION: this encoding should only be used for encoding/decoding data within a node.
    // If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
    // to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
    // encoding, which would not be noticed by the recipient because the garbage data is dropped 
    // at the decoding step - yet, it consumes the recipient's networking bandwidth. 
    var DefaultSerializer Serializer
    While I don't think we should refactor the naming of DefaultSerializer in this PR (change surface seems to big to me), I very much dislike that the word "default" is part of the struct's name. This is because components used by "default" should be BFT, which is not the case for the DefaultSerializer (violating the principle of "Safety By Default"). I am mentioning it here, because I would appreciate it if we avoided the word "default" in the code touched by this PR in cases where we are not using the strict encoding. As part of my review, I have added a variety of suggestions to use the wording "non-BFT" instead. If you come across other similarly "wrong" usages of the word "default" in the code adjacent to your updates, it would be super great if you could fix them if easily possible. 🙇

I haven't scanned through the code entirely and properly analyzed every usage, but based on a preliminary investigation, I think we may have major vulnerabilities for a resource exhaustion attacks in our code base 😱. I created this issue for further investigation by the execution team 👉 #6899 .

Thanks

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great work. While reviewing it, I got very worried about possibly non-BFT usages of the encoder-decoder definitions. Based on my experience, this is not very surprising to me, because too many engineers on the team are content with their code working on the happy path and don't inspect lower layers of the code to confirm that their usage of the code is in fact BFT.
It sucks, but I think the only way to address this problem is to make it unambiguously clear on all levels of the code when some components or structs are not BFT. While my suggestions are not a complete solution, it would be great to call out all non-BFT aspects in the portions of the code that we are touching ... and thereby increase the chances that engineers will randomly come across those callouts.

Sorry that those change requests are now popping up as part of your PR. 😅 I tried to be pragmatic and limit my requests to lightweight changes. Please don't hesitate to call out any instances, where you think my change requests are beyond the scope of this PR or you disagree with my suggestion. Appreciate your help in this regard, thanks 🙇

model/encoding/cbor/codec.go Outdated Show resolved Hide resolved
model/encoding/cbor/codec.go Outdated Show resolved Hide resolved
model/flow/chunk.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Outdated Show resolved Hide resolved
model/flow/chunk_test.go Show resolved Hide resolved
@jordanschalm jordanschalm requested a review from a team as a code owner January 16, 2025 21:39
Copy link
Member

@durkmurder durkmurder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I would take this a step further and leave a global variable/constructor only for SAFE decoder/encoder. Everything unsafe should be a specific case where caller understands what he is doing and creates an instance on its own. Not sure that is practical though.

@jordanschalm jordanschalm merged commit b87ed60 into feature/efm-recovery Jan 20, 2025
56 checks passed
@jordanschalm jordanschalm deleted the jord/chunk-srv-evt-count-cbor-encode branch January 20, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants