CBOR Encoding is compatible with stricter network decode mode #6879

jordanschalm · 2025-01-13T22:51:59Z

This PR modifies CBOR encoding of flow.Chunk to make it compatible with the stricter decoding rules used for the networking layer codec. It also moves the definition of the networking layer's decode mode into the same file where other encoding modes are defined.

this is failing a few tests, where we encode then decode. The decoded version has all empty ChunkBody fields

codecov-commenter · 2025-01-13T22:56:32Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 41.13%. Comparing base (f847f03) to head (31082a0).

Additional details and impacted files

@@                   Coverage Diff                    @@
##           feature/efm-recovery    #6879      +/-   ##
========================================================
+ Coverage                 41.12%   41.13%   +0.01%     
========================================================
  Files                      2123     2123              
  Lines                    187163   187163              
========================================================
+ Hits                      76968    76993      +25     
+ Misses                   103745   103723      -22     
+ Partials                   6450     6447       -3

Flag	Coverage Δ
unittests	`41.13% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

AlexHentschel · 2025-01-15T23:52:22Z

If it's not too much trouble, I would be great if you take a brief look at the DefaultSerializer in module/executiondatasync/execution_data/serializer.go. I would appreciate your insights on the matter.

Based on my understanding of the code, the DefaultSerializer defines its own encoding ... which seems to be originally intended for the Execution Node to internally store its execution data. My understanding is that this encoding is not strict (?) If that is the case, I think this needs to be highlighted as a warning in the documentation, which I would very much appreciate if we could do that in this PR:

In codec.go line 35 I would suggest to rename DecMode to UnsafeDecMode and add the following warning:

// UnsafeDecMode is is a permissive mode for creating a new cbor Decoder.
//
// CAUTION: this encoding should only be used for encoding/decoding data within a node.
// If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
// to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
// encoding, which would not be noticed by the recipient because the garbage data is dropped 
// at the decoding step - yet, it consumes the recipient's networking bandwidth. 
var UnsafeDecMode, _ = cbor.DecOptions{}.DecMode()

I hope the change surface is small enough for it to be doable as part of this PR. If renaming it is too much work, lets at least include the disclaimer.

If my assessment is correct, I think the same warning should be included in the NewCodec constructor (same file):

// NewCodec returns a new cbor Codec with the provided EncMode and DecMode.
// If either is nil, the default cbor EncMode/DecMode will be used.
// 
// CAUTION: this encoding should only be used for encoding/decoding data within a node.
// If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
// to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
// encoding, which would not be noticed by the recipient because the garbage data is dropped 
// at the decoding step - yet, it consumes the recipient's networking bandwidth. 
func NewCodec(opts ...Option) *Codec {
    ⋮

and also the DefaultSerializer in file serializer.go would deserve the same warning:
```
// DefaultSerializer is the default implementation for an Execution Data serializer.
// It is configured to use cbor encoding with LZ4 compression.
// 
// CAUTION: this encoding should only be used for encoding/decoding data within a node.
// If used for decoding data that is shared between nodes, it makes the recipient VULNERABLE
// to RESOURCE EXHAUSTION ATTACKS, where a byzantine sender could include garbage data in the
// encoding, which would not be noticed by the recipient because the garbage data is dropped 
// at the decoding step - yet, it consumes the recipient's networking bandwidth. 
var DefaultSerializer Serializer
```
While I don't think we should refactor the naming of DefaultSerializer in this PR (change surface seems to big to me), I very much dislike that the word "default" is part of the struct's name. This is because components used by "default" should be BFT, which is not the case for the DefaultSerializer (violating the principle of "Safety By Default"). I am mentioning it here, because I would appreciate it if we avoided the word "default" in the code touched by this PR in cases where we are not using the strict encoding. As part of my review, I have added a variety of suggestions to use the wording "non-BFT" instead. If you come across other similarly "wrong" usages of the word "default" in the code adjacent to your updates, it would be super great if you could fix them if easily possible. 🙇

I haven't scanned through the code entirely and properly analyzed every usage, but based on a preliminary investigation, I think we may have major vulnerabilities for a resource exhaustion attacks in our code base 😱. I created this issue for further investigation by the execution team 👉 #6899 .

Thanks

AlexHentschel

Thank you for the great work. While reviewing it, I got very worried about possibly non-BFT usages of the encoder-decoder definitions. Based on my experience, this is not very surprising to me, because too many engineers on the team are content with their code working on the happy path and don't inspect lower layers of the code to confirm that their usage of the code is in fact BFT.
It sucks, but I think the only way to address this problem is to make it unambiguously clear on all levels of the code when some components or structs are not BFT. While my suggestions are not a complete solution, it would be great to call out all non-BFT aspects in the portions of the code that we are touching ... and thereby increase the chances that engineers will randomly come across those callouts.

Sorry that those change requests are now popping up as part of your PR. 😅 I tried to be pragmatic and limit my requests to lightweight changes. Please don't hesitate to call out any instances, where you think my change requests are beyond the scope of this PR or you disagree with my suggestion. Appreciate your help in this regard, thanks 🙇

model/encoding/cbor/codec.go

model/flow/chunk.go

model/flow/chunk_test.go

…r-encode

Co-authored-by: Alexander Hentschel <alex.hentschel@flowfoundation.org>

…low/flow-go into jord/chunk-srv-evt-count-cbor-encode

…r-encode

durkmurder

Looks good. I would take this a step further and leave a global variable/constructor only for SAFE decoder/encoder. Everything unsafe should be a specific case where caller understands what he is doing and creates an instance on its own. Not sure that is practical though.

jordanschalm added 4 commits January 13, 2025 13:19

add cbor override

27abbb5

this is failing a few tests, where we encode then decode. The decoded version has all empty ChunkBody fields

use cbor omitempty tag instead of marshaler overload

907a31b

move default netw encode mode to model/encoding

5ba8323

add further tests for chunk cbor encoding

e8c8347

jordanschalm requested review from AlexHentschel and durkmurder January 13, 2025 22:52

AlexHentschel mentioned this pull request Jan 15, 2025

Potential for resource exhaustion vulnerability in case of a byzantine Execution Node #6899

Open

AlexHentschel approved these changes Jan 16, 2025

View reviewed changes

Merge branch 'feature/efm-recovery' into jord/chunk-srv-evt-count-cbo…

861e5b5

…r-encode

jordanschalm requested a review from a team as a code owner January 16, 2025 21:39

jordanschalm and others added 5 commits January 16, 2025 14:03

document unsafe decmode options

3b9a1c0

Apply suggestions from code review

d0c5215

Co-authored-by: Alexander Hentschel <alex.hentschel@flowfoundation.org>

add warning to exedata serializer

b77ff4e

Merge branch 'jord/chunk-srv-evt-count-cbor-encode' of github.com:onf…

0b527f3

…low/flow-go into jord/chunk-srv-evt-count-cbor-encode

Merge branch 'feature/efm-recovery' into jord/chunk-srv-evt-count-cbo…

31082a0

…r-encode

durkmurder approved these changes Jan 20, 2025

View reviewed changes

jordanschalm merged commit b87ed60 into feature/efm-recovery Jan 20, 2025
56 checks passed

jordanschalm deleted the jord/chunk-srv-evt-count-cbor-encode branch January 20, 2025 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CBOR Encoding is compatible with stricter network decode mode #6879

CBOR Encoding is compatible with stricter network decode mode #6879

jordanschalm commented Jan 13, 2025

codecov-commenter commented Jan 13, 2025 •

edited

Loading

AlexHentschel commented Jan 15, 2025 •

edited by jordanschalm

Loading

AlexHentschel left a comment •

edited

Loading

durkmurder left a comment

CBOR Encoding is compatible with stricter network decode mode #6879

CBOR Encoding is compatible with stricter network decode mode #6879

Conversation

jordanschalm commented Jan 13, 2025

codecov-commenter commented Jan 13, 2025 • edited Loading

Codecov Report

AlexHentschel commented Jan 15, 2025 • edited by jordanschalm Loading

AlexHentschel left a comment • edited Loading

Choose a reason for hiding this comment

durkmurder left a comment

Choose a reason for hiding this comment

codecov-commenter commented Jan 13, 2025 •

edited

Loading

AlexHentschel commented Jan 15, 2025 •

edited by jordanschalm

Loading

AlexHentschel left a comment •

edited

Loading