-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ffi): Add support for serializing/deserializing auto-generated key-value pairs in the new IR format (resolves #556). #653
feat(ffi): Add support for serializing/deserializing auto-generated key-value pairs in the new IR format (resolves #556). #653
Conversation
Warning Rate limit exceeded@LinZhihao-723 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 37 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThis pull request introduces significant modifications to the serialization process in the Changes
Possibly Related PRs
Suggested Reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
components/core/src/clp/ffi/ir_stream/Serializer.hpp (4)
85-87
: Clarify the reasons for returning false in the docstring.While the documentation mentions that this method returns a boolean, it would be helpful to detail the possible error conditions leading to a
false
return. This clarity can aid troubleshooting and improve maintainability.
90-93
: Consider returning a more descriptive status object.Accepting two maps and returning a plain
bool
might not convey sufficient detail on serialization failures. Replacing the boolean return with either a specialized status object or a rich error code would help users diagnose issues more effectively.
102-106
: Improve the @tparam documentation.The
is_auto_generated_node
template parameter is central to the logic. Adding a brief remark about the parameter in the docstring can clarify its effect on the serialization process.
116-117
: Assess memory usage for sequential buffers.If these buffers become large under frequent usage, consider chunk-based or streaming approaches to manage memory usage and reduce reallocation overhead.
components/core/src/clp/ffi/ir_stream/Serializer.cpp (2)
200-208
: Expand rationale for returning false.To help consumers of this API, enhance the comment block with scenarios that cause
serialize_value
to fail. This would streamline debugging.
438-525
: DFS approach is clear but be aware of stack usage.Although this DFS is iterative, the vector can still grow significantly for extremely nested structures. Consider limiting depth or providing fail-fast checks when encountering excessively nested data.
components/core/tests/test-ir_encoding_methods.cpp (2)
196-197
: Elaborate on auto-generated vs. user-generated bytes.Specifying how auto-generated and user-generated bytes differ will assist future maintainers who might wonder why two separate parameters are needed.
203-212
: Clarify function failure conditions.Document that this function returns
false
if either unpacked object is not a map or if serialization fails internally. This extra detail will improve usage clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/ffi/ir_stream/Serializer.cpp
(9 hunks)components/core/src/clp/ffi/ir_stream/Serializer.hpp
(2 hunks)components/core/tests/test-ir_encoding_methods.cpp
(7 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
components/core/tests/test-ir_encoding_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/src/clp/ffi/ir_stream/Serializer.cpp:248-288
Timestamp: 2024-11-10T16:46:53.300Z
Learning: In the C++ function `is_msgpack_array_serializable` in `components/core/src/clp/ffi/ir_stream/Serializer.cpp`, the code ensures that `curr->type` is always `MAP` or `ARRAY`, so accessing `curr->via.array` without an explicit type check is safe.
🪛 cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/Serializer.cpp
[error] 246-246: Syntax Error
(internalAstError)
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build-macos (macos-13, true)
🔇 Additional comments (13)
components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)
111-112
: Split schema trees look well-organized.Separating schema trees for auto-generated and user-generated keys improves clarity. This approach is consistent with the design described in the PR summary.
components/core/src/clp/ffi/ir_stream/Serializer.cpp (6)
3-3
: Include of is appropriate.The
<concepts>
header is used by the new serialization method concepts below. This inclusion is justified.
39-88
: Concept definitions appear consistent and clear.Each concept precisely encodes its requirements, aligning with the usage in your lambdas. This structure is robust and well-organized.
209-216
: Signature and parameters look consistent.These parameters align with the specialized approach for encoding multi-type values. The design appears stable.
340-396
: Validate large integer conversions.The check for 64-bit integer overflow is a good safeguard. If future expansions require bigger integer ranges, consider adding explicit error handling or documentation clarifying that 64-bit is the limit.
571-739
: Transaction-based rollback logic is well-structured.The usage of snapshot and revert for error handling is clean and reduces partial state issues. This design helps maintain consistency.
Line range hint
744-819
: Schema node insertion method is straightforward.This method carefully encodes the node’s type and key. It appears robust for both auto-generated and user-generated nodes.
components/core/tests/test-ir_encoding_methods.cpp (6)
208-212
: Minimal helper function is clear.Returning a handle with an empty map is straightforward and self-explanatory. Implementation matches the doc.
348-374
: Sensible error checks when unpacking maps.Confirming each msgpack object is a map before serialization is robust. The approach gracefully handles unexpected input.
1232-1236
: Efficient call with empty objects.Testing empty objects ensures coverage of boundary cases for the new two-map logic.
1263-1267
: Successful check against basic objects.Verifying the data types from int to null helps ensure universal coverage for typical JSON structures.
1283-1286
: Prevent serialization of arrays in unexpected places.Rejecting non-map objects fosters consistent IR generation. Good user-facing safety check.
Line range hint
1300-1305
: Recursive structure testing is thorough.Your repeated expansions validate stability for nested objects. This is important for real-world usage with complex JSON trees.
…s auto/user-generated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
components/core/src/clp_s/JsonParser.cpp (2)
52-52
: Consider documenting the purpose of theis_auto_generated
parameter.While the parameter is correctly marked as
[[maybe_unused]]
, adding documentation would help explain its intended future use in distinguishing between auto-generated and user-generated key-value pairs.
52-55
: Consider implementing different handling for auto-generated nodes.The method currently returns
IRErrorCode::IRErrorCode_Success
unconditionally, regardless of theis_auto_generated
flag. Consider implementing distinct handling logic for auto-generated nodes to fully utilize this parameter.components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
310-351
: Refactor to reduce code duplication in the parsing loopsThe two
while
loops starting at lines 310 and 357 perform similar logic for deserializing node IDs and values for auto-generated and user-generated keys. Consider refactoring to combine these loops or extract the common logic into a helper function to reduce code duplication and enhance readability.Also applies to: 357-375
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp
(1 hunks)components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp
(2 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
(7 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp
(3 hunks)components/core/src/clp_s/JsonParser.cpp
(1 hunks)components/core/tests/test-ffi_IrUnitHandlerInterface.cpp
(2 hunks)components/core/tests/test-ir_encoding_methods.cpp
(10 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
components/core/src/clp_s/JsonParser.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/tests/test-ir_encoding_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (2)
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-11-10T16:46:58.543Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-11-10T16:46:58.543Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
🔇 Additional comments (11)
components/core/tests/test-ir_encoding_methods.cpp (4)
114-114
: LGTM! Parameter addition is well-documented.The new parameter
is_auto_generated
is properly marked as unused and follows naming conventions.
197-205
: LGTM! Function changes handle auto-generated and user-generated data correctly.The implementation properly validates both message pack objects and follows good error handling practices.
Also applies to: 349-375
209-213
: LGTM! Well-implemented utility function.The function is concise, follows single responsibility principle, and has a clear descriptive name.
Also applies to: 377-382
Line range hint
1218-1351
: LGTM! Comprehensive test coverage for serialization and deserialization.The test case thoroughly validates:
- Both auto-generated and user-generated objects
- Empty objects and complex recursive structures
- Key-value pair counts and content equality
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (2)
20-20
: LGTM! Parameter addition aligns with PR objectives.The addition of the
is_auto_generated
parameter to the concept requirements is appropriate and consistent with the goal of supporting auto-generated key-value pairs.
46-51
: LGTM! Documentation and signature changes are well-aligned.The documentation clearly explains the purpose of the
is_auto_generated
parameter, and the signature update correctly reflects this change in the concept requirements.components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2)
32-40
: LGTM! Documentation updates are comprehensive.The documentation clearly explains the new pair return type and maintains consistent formatting with other method documentation in the file.
72-72
: LGTM! Documentation accurately reflects implementation changes.The error forwarding documentation correctly references the new deserialization method name.
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (2)
42-43
: LGTM! Handler implementation correctly follows interface changes.The addition of the
[[maybe_unused]]
attribute is appropriate for the test implementation, and the parameter matches the interface requirements.
113-113
: LGTM! Test case appropriately exercises new parameter.The test correctly passes the new boolean parameter, maintaining test coverage for the interface changes.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)
214-232
: LGTM! Deserializer changes are well-structured and maintainable.The changes effectively handle the distinction between auto-generated and user-generated schema trees:
- Use of structured binding improves code readability
- Schema tree selection logic is clear and efficient
- Handler call and node insertion correctly use the selected schema tree
auto& [auto_gen_node_id_value_pairs, | ||
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid binding references to temporaries to prevent dangling references
In lines 612-613, you're using auto&
with structured bindings on the temporary returned by value()
. This can lead to dangling references since you're binding to references of a temporary object. To ensure safety, bind by value instead.
Apply this diff to fix the issue:
-auto& [auto_gen_node_id_value_pairs,
+auto [auto_gen_node_id_value_pairs,
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()};
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
auto& [auto_gen_node_id_value_pairs, | |
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()}; | |
auto [auto_gen_node_id_value_pairs, | |
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use reference since later the value will be moved to construct the output log event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LinZhihao-723 You're absolutely right. Using references here is the correct approach since the values will be moved later when constructing the output log event. I apologize for the confusion in my previous comment.
✏️ Learnings added
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:29.773Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp
Outdated
Show resolved
Hide resolved
@@ -109,6 +110,7 @@ auto test_ir_unit_handler_interface(clp::ffi::ir_stream::IrUnitHandlerInterface | |||
REQUIRE( | |||
(IRErrorCode::IRErrorCode_Success | |||
== handler.handle_schema_tree_node_insertion( | |||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should test this with true and false, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc this unit test is just to ensure the interface thing works, so as long as it takes a boolean and gets compiled successfully it should be fine. It doesn't really do real work for the giving arguments.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)
101-114
: Documentation could be more detailed.Consider enhancing the documentation:
- Add
@return
tag for the successful case- Specify conditions that trigger each error code
/** * Deserializes the auto-generated node-ID-value pairs and the IDs of all user-generated keys in a * log event. * @param reader * @param tag Takes the current tag as input and returns the last tag read. + * @return On success, returns a pair containing: + * - The auto-generated node-ID-value pairs + * - The IDs of all user-generated keys * - The possible error codes: - * - Forwards `deserialize_tag`'s return values. - * - Forwards `deserialize_and_decode_schema_tree_node_id`'s return values. + * - Forwards `deserialize_tag`'s return values when tag deserialization fails + * - Forwards `deserialize_and_decode_schema_tree_node_id`'s return values when node ID decoding fails * - std::err::protocol_error if the IR stream contains auto-generated key IDs after at least one * user-generated key ID has been deserialized. */
Line range hint
302-376
: Consider refactoring to improve maintainability.The function is quite long and contains duplicated loop structures. Consider extracting helper functions for:
- Deserializing node IDs
- Processing auto-generated pairs
- Processing user-generated schema
+// Helper function to deserialize and validate node ID +auto deserialize_node_id(ReaderInterface& reader, encoded_tag_t& tag) + -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool, SchemaTree::Node::id_t>> { + auto const schema_tree_node_id_result{deserialize_and_decode_schema_tree_node_id< + cProtocol::Payload::EncodedSchemaTreeNodeIdByte, + cProtocol::Payload::EncodedSchemaTreeNodeIdShort, + cProtocol::Payload::EncodedSchemaTreeNodeIdInt>(tag, reader)}; + if (schema_tree_node_id_result.has_error()) { + return schema_tree_node_id_result.error(); + } + + if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) { + return ir_error_code_to_errc(err); + } + + return schema_tree_node_id_result; +} auto deserialize_auto_gen_node_id_value_pairs_and_user_gen_schema( ReaderInterface& reader, encoded_tag_t& tag ) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<KeyValuePairLogEvent::NodeIdValuePairs, Schema>> { KeyValuePairLogEvent::NodeIdValuePairs auto_gen_node_id_value_pairs; Schema user_gen_schema; // Deserialize auto-generated pairs until we find a user-generated ID while (is_encoded_key_id_tag(tag)) { - auto const schema_tree_node_id_result{deserialize_and_decode_schema_tree_node_id< - cProtocol::Payload::EncodedSchemaTreeNodeIdByte, - cProtocol::Payload::EncodedSchemaTreeNodeIdShort, - cProtocol::Payload::EncodedSchemaTreeNodeIdInt>(tag, reader)}; - if (schema_tree_node_id_result.has_error()) { - return schema_tree_node_id_result.error(); - } - - if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) { - return ir_error_code_to_errc(err); + auto const node_id_result = deserialize_node_id(reader, tag); + if (node_id_result.has_error()) { + return node_id_result.error(); } - auto const [is_auto_generated, node_id]{schema_tree_node_id_result.value()}; + auto const [is_auto_generated, node_id]{node_id_result.value()}; if (!is_auto_generated) { user_gen_schema.push_back(node_id); break; }
568-568
: Document the boolean flag's purpose.The return type now includes a boolean flag to indicate auto-generated nodes, but its purpose and implications aren't documented.
-) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool, SchemaTree::NodeLocator>> { +) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool /* is_auto_generated */, SchemaTree::NodeLocator>> {Also applies to: 585-585
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp
(2 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
(7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
🔇 Additional comments (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Line range hint
606-637
: LGTM! Well-structured deserialization logic.The implementation correctly:
- Uses references with structured bindings to avoid unnecessary copies
- Clearly separates auto-generated and user-generated pairs
- Maintains proper error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2)
32-40
: Documentation is clear but could be more specific.The documentation accurately reflects the changes and error conditions. Consider adding a brief explanation of what constitutes an "unsupported node type" in the protocol error case.
43-46
: Consider returning key_name as part of the result tuple.Instead of using an output parameter for
key_name
, consider including it in the return value as a tuple:std::tuple<bool, SchemaTree::NodeLocator, std::string>
. This would make the function more functional and eliminate the need for an output parameter.components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)
98-110
: Enhance function documentation.Consider improving the documentation by:
- Adding a
@return
tag for the successful case- Clarifying when the
std::errc::protocol_error
occurs by specifying the exact condition (i.e., when auto-generated key IDs appear after user-generated key IDs)
Line range hint
301-375
: Consider refactoring for improved maintainability.The function is quite long and contains duplicated logic in its two main loops. Consider:
- Extracting a helper function for the common node ID deserialization logic
- Adding a more descriptive error message for the protocol violation case
Example refactor:
+auto deserialize_schema_tree_node_id_and_advance( + ReaderInterface& reader, + encoded_tag_t& tag +) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool, SchemaTree::Node::id_t>> { + auto const schema_tree_node_id_result{deserialize_and_decode_schema_tree_node_id< + cProtocol::Payload::EncodedSchemaTreeNodeIdByte, + cProtocol::Payload::EncodedSchemaTreeNodeIdShort, + cProtocol::Payload::EncodedSchemaTreeNodeIdInt>(tag, reader)}; + if (schema_tree_node_id_result.has_error()) { + return schema_tree_node_id_result.error(); + } + if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) { + return ir_error_code_to_errc(err); + } + return schema_tree_node_id_result; +}components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)
Line range hint
198-391
: Consider enhancing integer overflow handling.The integer overflow check at lines 345-348 only handles positive integers. Consider adding similar checks for negative integers to ensure complete overflow protection.
if (msgpack::type::POSITIVE_INTEGER == val.type && static_cast<uint64_t>(INT64_MAX) < val.as<uint64_t>()) { return false; } +if (msgpack::type::NEGATIVE_INTEGER == val.type + && val.as<int64_t>() < INT64_MIN) +{ + return false; +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp
(1 hunks)components/core/src/clp/ffi/ir_stream/Serializer.cpp
(9 hunks)components/core/src/clp/ffi/ir_stream/Serializer.hpp
(2 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
(7 hunks)components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp
(3 hunks)components/core/src/clp_s/JsonParser.cpp
(1 hunks)components/core/tests/test-ffi_IrUnitHandlerInterface.cpp
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- components/core/tests/test-ffi_IrUnitHandlerInterface.cpp
- components/core/src/clp_s/JsonParser.cpp
- components/core/src/clp/ffi/ir_stream/Deserializer.hpp
🧰 Additional context used
📓 Path-based instructions (4)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
🪛 cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/Serializer.cpp
[error] 242-242: Syntax Error
(internalAstError)
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (11)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2)
7-7
: LGTM: Include statement is appropriate.The addition of
<utility>
header is necessary for usingstd::pair
in the return type.
72-72
: LGTM: Error handling documentation is accurate.The documentation correctly references the new deserialization method and maintains consistent error handling documentation style.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Line range hint
607-638
: LGTM! Well-structured deserialization logic.The implementation correctly:
- Uses references with structured bindings to avoid unnecessary copies
- Maintains proper error handling for both auto-generated and user-generated components
- Follows a clear separation of concerns
components/core/src/clp/ffi/ir_stream/Serializer.hpp (4)
85-93
: LGTM! The method signature update aligns with the PR objectives.The split into auto-generated and user-generated key-value pairs provides a clear separation of concerns for different serialization requirements.
102-106
: LGTM! Good use of template metaprogramming.The template parameter
is_auto_generated_node
enables compile-time optimization by avoiding runtime checks for node type differentiation.
111-112
: LGTM! Clear separation of schema trees.The split into separate schema trees for auto-generated and user-generated keys provides better encapsulation and maintainability.
116-117
: LGTM! Buffer changes support the new serialization requirements.The renamed buffer has a clearer purpose, and the new buffer provides dedicated storage for user-generated values.
components/core/src/clp/ffi/ir_stream/Serializer.cpp (4)
39-87
: LGTM! Well-defined concepts with clear documentation.The use of C++20 concepts provides strong compile-time contracts for serialization methods, making the code more maintainable and safer.
435-521
: LGTM! Robust DFS implementation with comprehensive error handling.The implementation correctly handles all edge cases, including empty maps and invalid key types. The code is well-documented and maintainable.
567-734
: LGTM! Well-structured implementation with proper transaction management.The separation of concerns between auto-generated and user-generated key-value pairs is clear, and the transaction management ensures data consistency.
797-815
: LGTM! Complete template specializations for both encoding types.The template specializations ensure that all necessary combinations of encoding types and node types are available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more minor things.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Outdated
Show resolved
Hide resolved
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
Show resolved
Hide resolved
* - The locator of the inserted schema tree node. | ||
* - The possible error codes: | ||
* - std::errc::result_out_of_range if the IR stream is truncated. | ||
* - std::errc::protocol_error if the deserialized node type isn't supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is protcol_error
still a return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it is. If the tag represents an invalid node type we will return ir_error_code_to_errc(IRErrorCode::IRErrorCode_Corrupted_IR)
which is std::errc::protocol_error
...
four_byte_encoded_variable_t, | ||
eight_byte_encoded_variable_t | ||
) { | ||
vector<int8_t> ir_buf; | ||
vector<nlohmann::json> serialized_json_objects; | ||
vector<std::pair<nlohmann::json, nlohmann::json>> expected_serialized_json_object_pairs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about expected_auto_gen_and_user_gen_objects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to expected_auto_gen_and_user_gen_object_pairs
; I think it's still worth indicating it stores pairs of objects
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
109-109
: Fix typo in documentation.Change "std::err" to "std::errc" to match the actual error code type.
- * - std::err::protocol_error if the IR stream contains auto-generated key IDs after at least one + * - std::errc::protocol_error if the IR stream contains auto-generated key IDs after at least onecomponents/core/tests/test-ir_encoding_methods.cpp (1)
197-205
: Update function documentation to reflect new parameters.The function signature has been updated to handle both auto-generated and user-generated msgpack bytes, but the documentation hasn't been updated to describe these parameters.
/** * Unpacks and serializes the given msgpack bytes using kv serializer. * @tparam encoded_variable_t + * @param auto_gen_msgpack_bytes The msgpack bytes containing auto-generated key-value pairs + * @param user_gen_msgpack_bytes The msgpack bytes containing user-generated key-value pairs * @param serializer * @return Whether serialization succeeded. */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
(7 hunks)components/core/tests/test-ir_encoding_methods.cpp
(10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/tests/test-ir_encoding_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: ubuntu-focal-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-focal-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: lint-check (macos-latest)
- GitHub Check: build (macos-latest)
🔇 Additional comments (8)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)
Line range hint
301-376
: LGTM! Well-structured implementation with proper error handling.The function effectively separates the deserialization of auto-generated and user-generated node IDs while maintaining proper error handling throughout.
570-570
: LGTM! Return type enhancement.The updated return type appropriately includes the auto-generation flag, which aligns with the PR's objective of supporting auto-generated key-value pairs.
Line range hint
608-639
: LGTM! Proper handling of auto-generated and user-generated key-value pairs.The implementation correctly uses references for structured bindings (as per previous learnings) and properly manages the deserialization of both types of key-value pairs.
components/core/tests/test-ir_encoding_methods.cpp (5)
114-114
: LGTM! The parameter addition aligns with the feature requirements.The new
is_auto_generated
parameter inhandle_schema_tree_node_insertion
correctly supports the distinction between auto-generated and user-generated schema tree nodes.
209-213
: LGTM! Well-documented helper function.The new helper function is clear, concise, and serves a specific purpose in creating empty msgpack map objects for testing.
1218-1219
: LGTM! Test case name corrected.The test case name has been correctly updated to use "serde" (serialize/deserialize) instead of "serder".
1224-1224
: Improve variable name for clarity.Consider renaming the variable to better reflect its purpose.
- vector<std::pair<nlohmann::json, nlohmann::json>> expected_serialized_json_object_pairs; + vector<std::pair<nlohmann::json, nlohmann::json>> expected_auto_gen_and_user_gen_objects;
1326-1351
: LGTM! Comprehensive test coverage.The test case thoroughly verifies:
- The number of deserialized events matches the expected count
- The number of key-value pairs matches the leaf count in both auto-generated and user-generated objects
- The serialized JSON objects match the expected objects
…methods.cpp Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Line range hint
301-376
: LGTM! Robust implementation with thorough error handling.The function effectively handles both auto-generated and user-generated node IDs, with proper validation of their ordering. The use of move semantics in the return statement is efficient.
Consider enhancing the error handling by using a custom error code instead of
std::errc::protocol_error
to provide more specific error information. For example:- return std::errc::protocol_error; + return make_error_code(IRErrorCode::IRErrorCode_AutoGenAfterUserGen);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp
(7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-13, false)
- GitHub Check: build-macos (macos-13, true)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: build (macos-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (3)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)
98-110
: LGTM! Well-documented function.The documentation is thorough, clearly explaining the function's purpose, return values, and error conditions.
570-570
: LGTM! Return type change aligns with requirements.The addition of the auto-generation flag in the return type properly supports the new IR format requirements.
Also applies to: 587-587
Line range hint
608-639
: LGTM! Efficient implementation with proper reference handling.The implementation correctly uses references with structured bindings to avoid unnecessary copies, as per the established best practices. The error handling is thorough, and move semantics are used effectively in the final event construction.
Description
This PR fully implements #556. It introduces methods to serialize/deserialize auto-generated key-value pairs in KV pair IR format.
Changes to
Serializer
:serialize_msgpack_map_using_dfs
to take lambda functions as serialization callbacks so that auto-gen and user-gen kv pairs can be encoded in different ways (different node ID encoding and different serialization ordering)Validation performed
Summary by CodeRabbit
Refactor
New Features
Tests