Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ffi): Add support for serializing/deserializing auto-generated key-value pairs in the new IR format (resolves #556). #653

Merged

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Jan 6, 2025

Description

This PR fully implements #556. It introduces methods to serialize/deserialize auto-generated key-value pairs in KV pair IR format.

Changes to Serializer:

  • Remove several member functions and make them as local helpers
  • Generalize serialize_msgpack_map_using_dfs to take lambda functions as serialization callbacks so that auto-gen and user-gen kv pairs can be encoded in different ways (different node ID encoding and different serialization ordering)

Validation performed

  • Ensure workflows all passed.
  • Enhanced unit tests to:
    • Ensure log events without auto-generated kv-pairs can be successfully deserialized, providing backward compatibility.

Summary by CodeRabbit

  • Refactor

    • Enhanced serialization logic for handling auto-generated and user-generated key-value pairs.
    • Improved MessagePack data processing with more flexible serialization methods.
    • Restructured internal data management for better modularity.
  • New Features

    • Added advanced serialization capabilities for different data types.
    • Introduced more granular control over data serialization processes.
    • Enhanced deserialization methods to accommodate distinctions between auto-generated and user-generated keys.
  • Tests

    • Updated test methods to support new serialization approach.
    • Added support for testing empty map serialization scenarios.

Copy link
Contributor

coderabbitai bot commented Jan 6, 2025

Warning

Rate limit exceeded

@LinZhihao-723 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between c6b9818 and 022549d.

📒 Files selected for processing (1)
  • components/core/tests/test-ir_encoding_methods.cpp (10 hunks)

Walkthrough

This pull request introduces significant modifications to the serialization process in the Serializer class. The changes focus on enhancing the handling of MessagePack data by introducing new serialization methods and concepts. Key modifications include the addition of a new serialize_value method, updates to the serialize_msgpack_map method to handle both auto-generated and user-generated key-value pairs, and the introduction of new serialization concepts for schema tree nodes and map serialization.

Changes

File Change Summary
components/core/src/clp/ffi/ir_stream/Serializer.cpp - Added serialize_value method
- Modified serialize_msgpack_map to handle two map parameters
- Added serialize_msgpack_map_using_dfs method
components/core/src/clp/ffi/ir_stream/Serializer.hpp - Updated serialize_msgpack_map method signature
- Renamed schema tree and buffer variables
- Added m_user_gen_val_group_buf
components/core/tests/test-ir_encoding_methods.cpp - Updated unpack_and_serialize_msgpack_bytes to accept two msgpack byte vectors
- Added create_msgpack_empty_map_obj_handle method
components/core/src/clp/ffi/ir_stream/Deserializer.hpp - Updated deserialize_next_ir_unit to handle auto-generated flag
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp - Updated handle_schema_tree_node_insertion method signature to include is_auto_generated
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp - Renamed deserialize_schema to deserialize_auto_gen_node_id_value_pairs_and_user_gen_schema
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp - Updated return type of deserialize_ir_unit_schema_tree_node_insertion method
components/core/src/clp_s/JsonParser.cpp - Updated handle_schema_tree_node_insertion method signature to include is_auto_generated
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp - Updated handle_schema_tree_node_insertion method signature in TrivialIrUnitHandler

Possibly Related PRs

Suggested Reviewers

  • kirkrodrigues
  • wraymo
  • haiqi96

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
components/core/src/clp/ffi/ir_stream/Serializer.hpp (4)

85-87: Clarify the reasons for returning false in the docstring.

While the documentation mentions that this method returns a boolean, it would be helpful to detail the possible error conditions leading to a false return. This clarity can aid troubleshooting and improve maintainability.


90-93: Consider returning a more descriptive status object.

Accepting two maps and returning a plain bool might not convey sufficient detail on serialization failures. Replacing the boolean return with either a specialized status object or a rich error code would help users diagnose issues more effectively.


102-106: Improve the @tparam documentation.

The is_auto_generated_node template parameter is central to the logic. Adding a brief remark about the parameter in the docstring can clarify its effect on the serialization process.


116-117: Assess memory usage for sequential buffers.

If these buffers become large under frequent usage, consider chunk-based or streaming approaches to manage memory usage and reduce reallocation overhead.

components/core/src/clp/ffi/ir_stream/Serializer.cpp (2)

200-208: Expand rationale for returning false.

To help consumers of this API, enhance the comment block with scenarios that cause serialize_value to fail. This would streamline debugging.


438-525: DFS approach is clear but be aware of stack usage.

Although this DFS is iterative, the vector can still grow significantly for extremely nested structures. Consider limiting depth or providing fail-fast checks when encountering excessively nested data.

components/core/tests/test-ir_encoding_methods.cpp (2)

196-197: Elaborate on auto-generated vs. user-generated bytes.

Specifying how auto-generated and user-generated bytes differ will assist future maintainers who might wonder why two separate parameters are needed.


203-212: Clarify function failure conditions.

Document that this function returns false if either unpacked object is not a map or if serialization fails internally. This extra detail will improve usage clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e1f70e7 and 56fd9bb.

📒 Files selected for processing (3)
  • components/core/src/clp/ffi/ir_stream/Serializer.cpp (9 hunks)
  • components/core/src/clp/ffi/ir_stream/Serializer.hpp (2 hunks)
  • components/core/tests/test-ir_encoding_methods.cpp (7 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
components/core/tests/test-ir_encoding_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/src/clp/ffi/ir_stream/Serializer.cpp:248-288
Timestamp: 2024-11-10T16:46:53.300Z
Learning: In the C++ function `is_msgpack_array_serializable` in `components/core/src/clp/ffi/ir_stream/Serializer.cpp`, the code ensures that `curr->type` is always `MAP` or `ARRAY`, so accessing `curr->via.array` without an explicit type check is safe.
🪛 cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/Serializer.cpp

[error] 246-246: Syntax Error

(internalAstError)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build-macos (macos-13, true)
🔇 Additional comments (13)
components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)

111-112: Split schema trees look well-organized.

Separating schema trees for auto-generated and user-generated keys improves clarity. This approach is consistent with the design described in the PR summary.

components/core/src/clp/ffi/ir_stream/Serializer.cpp (6)

3-3: Include of is appropriate.

The <concepts> header is used by the new serialization method concepts below. This inclusion is justified.


39-88: Concept definitions appear consistent and clear.

Each concept precisely encodes its requirements, aligning with the usage in your lambdas. This structure is robust and well-organized.


209-216: Signature and parameters look consistent.

These parameters align with the specialized approach for encoding multi-type values. The design appears stable.


340-396: Validate large integer conversions.

The check for 64-bit integer overflow is a good safeguard. If future expansions require bigger integer ranges, consider adding explicit error handling or documentation clarifying that 64-bit is the limit.


571-739: Transaction-based rollback logic is well-structured.

The usage of snapshot and revert for error handling is clean and reduces partial state issues. This design helps maintain consistency.


Line range hint 744-819: Schema node insertion method is straightforward.

This method carefully encodes the node’s type and key. It appears robust for both auto-generated and user-generated nodes.

components/core/tests/test-ir_encoding_methods.cpp (6)

208-212: Minimal helper function is clear.

Returning a handle with an empty map is straightforward and self-explanatory. Implementation matches the doc.


348-374: Sensible error checks when unpacking maps.

Confirming each msgpack object is a map before serialization is robust. The approach gracefully handles unexpected input.


1232-1236: Efficient call with empty objects.

Testing empty objects ensures coverage of boundary cases for the new two-map logic.


1263-1267: Successful check against basic objects.

Verifying the data types from int to null helps ensure universal coverage for typical JSON structures.


1283-1286: Prevent serialization of arrays in unexpected places.

Rejecting non-map objects fosters consistent IR generation. Good user-facing safety check.


Line range hint 1300-1305: Recursive structure testing is thorough.

Your repeated expansions validate stability for nested objects. This is important for real-world usage with complex JSON trees.

@kirkrodrigues kirkrodrigues self-requested a review January 6, 2025 22:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
components/core/src/clp_s/JsonParser.cpp (2)

52-52: Consider documenting the purpose of the is_auto_generated parameter.

While the parameter is correctly marked as [[maybe_unused]], adding documentation would help explain its intended future use in distinguishing between auto-generated and user-generated key-value pairs.


52-55: Consider implementing different handling for auto-generated nodes.

The method currently returns IRErrorCode::IRErrorCode_Success unconditionally, regardless of the is_auto_generated flag. Consider implementing distinct handling logic for auto-generated nodes to fully utilize this parameter.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

310-351: Refactor to reduce code duplication in the parsing loops

The two while loops starting at lines 310 and 357 perform similar logic for deserializing node IDs and values for auto-generated and user-generated keys. Consider refactoring to combine these loops or extract the common logic into a helper function to reduce code duplication and enhance readability.

Also applies to: 357-375

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56fd9bb and 7edbecd.

📒 Files selected for processing (7)
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (2 hunks)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (7 hunks)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (3 hunks)
  • components/core/src/clp_s/JsonParser.cpp (1 hunks)
  • components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (2 hunks)
  • components/core/tests/test-ir_encoding_methods.cpp (10 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
components/core/src/clp_s/JsonParser.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/tests/test-ir_encoding_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (2)
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-11-10T16:46:58.543Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#549
File: components/core/tests/test-ir_encoding_methods.cpp:110-110
Timestamp: 2024-11-10T16:46:58.543Z
Learning: The function `handle_schema_tree_node_insertion` in `IrUnitHandler` must accept `clp::ffi::SchemaTree::NodeLocator` by value to match the `IrUnitHandlerInterface`.
🔇 Additional comments (11)
components/core/tests/test-ir_encoding_methods.cpp (4)

114-114: LGTM! Parameter addition is well-documented.

The new parameter is_auto_generated is properly marked as unused and follows naming conventions.


197-205: LGTM! Function changes handle auto-generated and user-generated data correctly.

The implementation properly validates both message pack objects and follows good error handling practices.

Also applies to: 349-375


209-213: LGTM! Well-implemented utility function.

The function is concise, follows single responsibility principle, and has a clear descriptive name.

Also applies to: 377-382


Line range hint 1218-1351: LGTM! Comprehensive test coverage for serialization and deserialization.

The test case thoroughly validates:

  • Both auto-generated and user-generated objects
  • Empty objects and complex recursive structures
  • Key-value pair counts and content equality
components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (2)

20-20: LGTM! Parameter addition aligns with PR objectives.

The addition of the is_auto_generated parameter to the concept requirements is appropriate and consistent with the goal of supporting auto-generated key-value pairs.


46-51: LGTM! Documentation and signature changes are well-aligned.

The documentation clearly explains the purpose of the is_auto_generated parameter, and the signature update correctly reflects this change in the concept requirements.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2)

32-40: LGTM! Documentation updates are comprehensive.

The documentation clearly explains the new pair return type and maintains consistent formatting with other method documentation in the file.


72-72: LGTM! Documentation accurately reflects implementation changes.

The error forwarding documentation correctly references the new deserialization method name.

components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (2)

42-43: LGTM! Handler implementation correctly follows interface changes.

The addition of the [[maybe_unused]] attribute is appropriate for the test implementation, and the parameter matches the interface requirements.


113-113: LGTM! Test case appropriately exercises new parameter.

The test correctly passes the new boolean parameter, maintaining test coverage for the interface changes.

components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)

214-232: LGTM! Deserializer changes are well-structured and maintainable.

The changes effectively handle the distinction between auto-generated and user-generated schema trees:

  • Use of structured binding improves code readability
  • Schema tree selection logic is clear and efficient
  • Handler call and node insertion correctly use the selected schema tree

Comment on lines +612 to +613
auto& [auto_gen_node_id_value_pairs,
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()};
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid binding references to temporaries to prevent dangling references

In lines 612-613, you're using auto& with structured bindings on the temporary returned by value(). This can lead to dangling references since you're binding to references of a temporary object. To ensure safety, bind by value instead.

Apply this diff to fix the issue:

-auto& [auto_gen_node_id_value_pairs,
+auto [auto_gen_node_id_value_pairs,
        user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto& [auto_gen_node_id_value_pairs,
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()};
auto [auto_gen_node_id_value_pairs,
user_gen_schema]{auto_gen_node_id_value_pairs_and_user_gen_schema_result.value()};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use reference since later the value will be moved to construct the output log event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LinZhihao-723 You're absolutely right. Using references here is the correct approach since the values will be moved later when constructing the output log event. I apologize for the confusion in my previous comment.


✏️ Learnings added
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:29.773Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

@@ -109,6 +110,7 @@ auto test_ir_unit_handler_interface(clp::ffi::ir_stream::IrUnitHandlerInterface
REQUIRE(
(IRErrorCode::IRErrorCode_Success
== handler.handle_schema_tree_node_insertion(
true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test this with true and false, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this unit test is just to ensure the interface thing works, so as long as it takes a boolean and gets compiled successfully it should be fine. It doesn't really do real work for the giving arguments.

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)

101-114: Documentation could be more detailed.

Consider enhancing the documentation:

  1. Add @return tag for the successful case
  2. Specify conditions that trigger each error code
 /**
  * Deserializes the auto-generated node-ID-value pairs and the IDs of all user-generated keys in a
  * log event.
  * @param reader
  * @param tag Takes the current tag as input and returns the last tag read.
+ * @return On success, returns a pair containing:
+ *   - The auto-generated node-ID-value pairs
+ *   - The IDs of all user-generated keys
  * - The possible error codes:
- *   - Forwards `deserialize_tag`'s return values.
- *   - Forwards `deserialize_and_decode_schema_tree_node_id`'s return values.
+ *   - Forwards `deserialize_tag`'s return values when tag deserialization fails
+ *   - Forwards `deserialize_and_decode_schema_tree_node_id`'s return values when node ID decoding fails
  *   - std::err::protocol_error if the IR stream contains auto-generated key IDs after at least one
  *     user-generated key ID has been deserialized.
  */

Line range hint 302-376: Consider refactoring to improve maintainability.

The function is quite long and contains duplicated loop structures. Consider extracting helper functions for:

  1. Deserializing node IDs
  2. Processing auto-generated pairs
  3. Processing user-generated schema
+// Helper function to deserialize and validate node ID
+auto deserialize_node_id(ReaderInterface& reader, encoded_tag_t& tag)
+        -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool, SchemaTree::Node::id_t>> {
+    auto const schema_tree_node_id_result{deserialize_and_decode_schema_tree_node_id<
+            cProtocol::Payload::EncodedSchemaTreeNodeIdByte,
+            cProtocol::Payload::EncodedSchemaTreeNodeIdShort,
+            cProtocol::Payload::EncodedSchemaTreeNodeIdInt>(tag, reader)};
+    if (schema_tree_node_id_result.has_error()) {
+        return schema_tree_node_id_result.error();
+    }
+    
+    if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) {
+        return ir_error_code_to_errc(err);
+    }
+    
+    return schema_tree_node_id_result;
+}

 auto deserialize_auto_gen_node_id_value_pairs_and_user_gen_schema(
         ReaderInterface& reader,
         encoded_tag_t& tag
 ) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<KeyValuePairLogEvent::NodeIdValuePairs, Schema>> {
     KeyValuePairLogEvent::NodeIdValuePairs auto_gen_node_id_value_pairs;
     Schema user_gen_schema;

     // Deserialize auto-generated pairs until we find a user-generated ID
     while (is_encoded_key_id_tag(tag)) {
-        auto const schema_tree_node_id_result{deserialize_and_decode_schema_tree_node_id<
-                cProtocol::Payload::EncodedSchemaTreeNodeIdByte,
-                cProtocol::Payload::EncodedSchemaTreeNodeIdShort,
-                cProtocol::Payload::EncodedSchemaTreeNodeIdInt>(tag, reader)};
-        if (schema_tree_node_id_result.has_error()) {
-            return schema_tree_node_id_result.error();
-        }
-
-        if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) {
-            return ir_error_code_to_errc(err);
+        auto const node_id_result = deserialize_node_id(reader, tag);
+        if (node_id_result.has_error()) {
+            return node_id_result.error();
         }

-        auto const [is_auto_generated, node_id]{schema_tree_node_id_result.value()};
+        auto const [is_auto_generated, node_id]{node_id_result.value()};
         if (!is_auto_generated) {
             user_gen_schema.push_back(node_id);
             break;
         }

568-568: Document the boolean flag's purpose.

The return type now includes a boolean flag to indicate auto-generated nodes, but its purpose and implications aren't documented.

-) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool, SchemaTree::NodeLocator>> {
+) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool /* is_auto_generated */, SchemaTree::NodeLocator>> {

Also applies to: 585-585

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7edbecd and 694927d.

📒 Files selected for processing (2)
  • components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp (2 hunks)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (7 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp/ffi/ir_stream/IrUnitHandlerInterface.hpp
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
🔇 Additional comments (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Line range hint 606-637: LGTM! Well-structured deserialization logic.

The implementation correctly:

  • Uses references with structured bindings to avoid unnecessary copies
  • Clearly separates auto-generated and user-generated pairs
  • Maintains proper error handling

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2)

32-40: Documentation is clear but could be more specific.

The documentation accurately reflects the changes and error conditions. Consider adding a brief explanation of what constitutes an "unsupported node type" in the protocol error case.


43-46: Consider returning key_name as part of the result tuple.

Instead of using an output parameter for key_name, consider including it in the return value as a tuple: std::tuple<bool, SchemaTree::NodeLocator, std::string>. This would make the function more functional and eliminate the need for an output parameter.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (2)

98-110: Enhance function documentation.

Consider improving the documentation by:

  1. Adding a @return tag for the successful case
  2. Clarifying when the std::errc::protocol_error occurs by specifying the exact condition (i.e., when auto-generated key IDs appear after user-generated key IDs)

Line range hint 301-375: Consider refactoring for improved maintainability.

The function is quite long and contains duplicated logic in its two main loops. Consider:

  1. Extracting a helper function for the common node ID deserialization logic
  2. Adding a more descriptive error message for the protocol violation case

Example refactor:

+auto deserialize_schema_tree_node_id_and_advance(
+    ReaderInterface& reader,
+    encoded_tag_t& tag
+) -> OUTCOME_V2_NAMESPACE::std_result<std::pair<bool, SchemaTree::Node::id_t>> {
+    auto const schema_tree_node_id_result{deserialize_and_decode_schema_tree_node_id<
+            cProtocol::Payload::EncodedSchemaTreeNodeIdByte,
+            cProtocol::Payload::EncodedSchemaTreeNodeIdShort,
+            cProtocol::Payload::EncodedSchemaTreeNodeIdInt>(tag, reader)};
+    if (schema_tree_node_id_result.has_error()) {
+        return schema_tree_node_id_result.error();
+    }
+    if (auto const err{deserialize_tag(reader, tag)}; IRErrorCode::IRErrorCode_Success != err) {
+        return ir_error_code_to_errc(err);
+    }
+    return schema_tree_node_id_result;
+}
components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)

Line range hint 198-391: Consider enhancing integer overflow handling.

The integer overflow check at lines 345-348 only handles positive integers. Consider adding similar checks for negative integers to ensure complete overflow protection.

 if (msgpack::type::POSITIVE_INTEGER == val.type
     && static_cast<uint64_t>(INT64_MAX) < val.as<uint64_t>())
 {
     return false;
 }
+if (msgpack::type::NEGATIVE_INTEGER == val.type
+    && val.as<int64_t>() < INT64_MIN)
+{
+    return false;
+}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 694927d and 7c79006.

📒 Files selected for processing (7)
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1 hunks)
  • components/core/src/clp/ffi/ir_stream/Serializer.cpp (9 hunks)
  • components/core/src/clp/ffi/ir_stream/Serializer.hpp (2 hunks)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (7 hunks)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (3 hunks)
  • components/core/src/clp_s/JsonParser.cpp (1 hunks)
  • components/core/tests/test-ffi_IrUnitHandlerInterface.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • components/core/tests/test-ffi_IrUnitHandlerInterface.cpp
  • components/core/src/clp_s/JsonParser.cpp
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp
🧰 Additional context used
📓 Path-based instructions (4)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/Serializer.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
🪛 cppcheck (2.10-2)
components/core/src/clp/ffi/ir_stream/Serializer.cpp

[error] 242-242: Syntax Error

(internalAstError)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-14, true)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (11)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2)

7-7: LGTM: Include statement is appropriate.

The addition of <utility> header is necessary for using std::pair in the return type.


72-72: LGTM: Error handling documentation is accurate.

The documentation correctly references the new deserialization method and maintains consistent error handling documentation style.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Line range hint 607-638: LGTM! Well-structured deserialization logic.

The implementation correctly:

  • Uses references with structured bindings to avoid unnecessary copies
  • Maintains proper error handling for both auto-generated and user-generated components
  • Follows a clear separation of concerns
components/core/src/clp/ffi/ir_stream/Serializer.hpp (4)

85-93: LGTM! The method signature update aligns with the PR objectives.

The split into auto-generated and user-generated key-value pairs provides a clear separation of concerns for different serialization requirements.


102-106: LGTM! Good use of template metaprogramming.

The template parameter is_auto_generated_node enables compile-time optimization by avoiding runtime checks for node type differentiation.


111-112: LGTM! Clear separation of schema trees.

The split into separate schema trees for auto-generated and user-generated keys provides better encapsulation and maintainability.


116-117: LGTM! Buffer changes support the new serialization requirements.

The renamed buffer has a clearer purpose, and the new buffer provides dedicated storage for user-generated values.

components/core/src/clp/ffi/ir_stream/Serializer.cpp (4)

39-87: LGTM! Well-defined concepts with clear documentation.

The use of C++20 concepts provides strong compile-time contracts for serialization methods, making the code more maintainable and safer.


435-521: LGTM! Robust DFS implementation with comprehensive error handling.

The implementation correctly handles all edge cases, including empty maps and invalid key types. The code is well-documented and maintainable.


567-734: LGTM! Well-structured implementation with proper transaction management.

The separation of concerns between auto-generated and user-generated key-value pairs is clear, and the transaction management ensures data consistency.


797-815: LGTM! Complete template specializations for both encoding types.

The template specializations ensure that all necessary combinations of encoding types and node types are available.

Copy link
Member

@kirkrodrigues kirkrodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor things.

* - The locator of the inserted schema tree node.
* - The possible error codes:
* - std::errc::result_out_of_range if the IR stream is truncated.
* - std::errc::protocol_error if the deserialized node type isn't supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is protcol_error still a return value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is. If the tag represents an invalid node type we will return ir_error_code_to_errc(IRErrorCode::IRErrorCode_Corrupted_IR) which is std::errc::protocol_error...

components/core/tests/test-ir_encoding_methods.cpp Outdated Show resolved Hide resolved
four_byte_encoded_variable_t,
eight_byte_encoded_variable_t
) {
vector<int8_t> ir_buf;
vector<nlohmann::json> serialized_json_objects;
vector<std::pair<nlohmann::json, nlohmann::json>> expected_serialized_json_object_pairs;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about expected_auto_gen_and_user_gen_objects?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to expected_auto_gen_and_user_gen_object_pairs; I think it's still worth indicating it stores pairs of objects

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

109-109: Fix typo in documentation.

Change "std::err" to "std::errc" to match the actual error code type.

- *   - std::err::protocol_error if the IR stream contains auto-generated key IDs after at least one
+ *   - std::errc::protocol_error if the IR stream contains auto-generated key IDs after at least one
components/core/tests/test-ir_encoding_methods.cpp (1)

197-205: Update function documentation to reflect new parameters.

The function signature has been updated to handle both auto-generated and user-generated msgpack bytes, but the documentation hasn't been updated to describe these parameters.

 /**
  * Unpacks and serializes the given msgpack bytes using kv serializer.
  * @tparam encoded_variable_t
+ * @param auto_gen_msgpack_bytes The msgpack bytes containing auto-generated key-value pairs
+ * @param user_gen_msgpack_bytes The msgpack bytes containing user-generated key-value pairs
  * @param serializer
  * @return Whether serialization succeeded.
  */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c79006 and 80f74d3.

📒 Files selected for processing (2)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (7 hunks)
  • components/core/tests/test-ir_encoding_methods.cpp (10 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
components/core/tests/test-ir_encoding_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: ubuntu-focal-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: ubuntu-focal-dynamic-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (macos-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (8)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)

Line range hint 301-376: LGTM! Well-structured implementation with proper error handling.

The function effectively separates the deserialization of auto-generated and user-generated node IDs while maintaining proper error handling throughout.


570-570: LGTM! Return type enhancement.

The updated return type appropriately includes the auto-generation flag, which aligns with the PR's objective of supporting auto-generated key-value pairs.


Line range hint 608-639: LGTM! Proper handling of auto-generated and user-generated key-value pairs.

The implementation correctly uses references for structured bindings (as per previous learnings) and properly manages the deserialization of both types of key-value pairs.

components/core/tests/test-ir_encoding_methods.cpp (5)

114-114: LGTM! The parameter addition aligns with the feature requirements.

The new is_auto_generated parameter in handle_schema_tree_node_insertion correctly supports the distinction between auto-generated and user-generated schema tree nodes.


209-213: LGTM! Well-documented helper function.

The new helper function is clear, concise, and serves a specific purpose in creating empty msgpack map objects for testing.


1218-1219: LGTM! Test case name corrected.

The test case name has been correctly updated to use "serde" (serialize/deserialize) instead of "serder".


1224-1224: Improve variable name for clarity.

Consider renaming the variable to better reflect its purpose.

-    vector<std::pair<nlohmann::json, nlohmann::json>> expected_serialized_json_object_pairs;
+    vector<std::pair<nlohmann::json, nlohmann::json>> expected_auto_gen_and_user_gen_objects;

1326-1351: LGTM! Comprehensive test coverage.

The test case thoroughly verifies:

  1. The number of deserialized events matches the expected count
  2. The number of key-value pairs matches the leaf count in both auto-generated and user-generated objects
  3. The serialized JSON objects match the expected objects

…methods.cpp

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Line range hint 301-376: LGTM! Robust implementation with thorough error handling.

The function effectively handles both auto-generated and user-generated node IDs, with proper validation of their ordering. The use of move semantics in the return statement is efficient.

Consider enhancing the error handling by using a custom error code instead of std::errc::protocol_error to provide more specific error information. For example:

-            return std::errc::protocol_error;
+            return make_error_code(IRErrorCode::IRErrorCode_AutoGenAfterUserGen);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80f74d3 and c6b9818.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (1)
Learnt from: LinZhihao-723
PR: y-scope/clp#653
File: components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp:612-613
Timestamp: 2025-01-16T16:48:30.016Z
Learning: When using structured bindings with values that will be moved later (e.g., into a constructor), prefer using references (`auto&`) to avoid unnecessary copies, even when binding to the result of a temporary object that lives until the end of the full expression.
⏰ Context from checks skipped due to timeout of 90000ms (6)
  • GitHub Check: build-macos (macos-14, false)
  • GitHub Check: build-macos (macos-13, false)
  • GitHub Check: build-macos (macos-13, true)
  • GitHub Check: lint-check (ubuntu-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: lint-check (macos-latest)
🔇 Additional comments (3)
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (3)

98-110: LGTM! Well-documented function.

The documentation is thorough, clearly explaining the function's purpose, return values, and error conditions.


570-570: LGTM! Return type change aligns with requirements.

The addition of the auto-generation flag in the return type properly supports the new IR format requirements.

Also applies to: 587-587


Line range hint 608-639: LGTM! Efficient implementation with proper reference handling.

The implementation correctly uses references with structured bindings to avoid unnecessary copies, as per the established best practices. The error handling is thorough, and move semantics are used effectively in the final event construction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants