Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi: Add class for key-value pair log events. #507

Merged
merged 29 commits into from
Aug 15, 2024

Conversation

LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Aug 4, 2024

Description

This PR introduces a new class KeyValuePairLogEvent. This class represents a key-value pair log event serialized by the key-value pair IR format. It consists of the following members:

  • An unordered map of key-value pairs. Each key-value pair consists of an id (as the key) and a Value, where the id refers to the schema tree node id.
  • A shared pointer to the schema tree.
  • An UTC offset.

An instance must be created using the factory function. The factory function will validate whether the value's type matches the corresponding key. The input of the factory function should come from the deserialization methods, which will be used in the coming PRs.

Validation performed

  • Ensure the class can be successfully built.
  • Tested in development branches that this class is functional with key-value pair IR format deserializer.
  • Add unit tests to cover errors on creation.

@LinZhihao-723 LinZhihao-723 changed the title [WIP] ffi: Add class for key-value pair log events. ffi: Add class for key-value pair log events. Aug 8, 2024
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
Comment on lines 66 to 70
if (false == value.has_value()) {
// Empty value
if (SchemaTreeNode::Type::Obj != type) {
return std::errc::protocol_error;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda feel like this can be moved into the value type validation method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to leave it outside for two reasons:

  1. empty isn't a Value type
  2. This simplifies the helper function so it doesn't have to check has_value() and we don't need to symbols inside the helper (one for the std::optional<Value> and the over for the reference of the underlying Value)

components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@haiqi96 haiqi96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just quickly skimmed through the class and haven't read the unit tests yet.

* `node_id_value_pairs`. A node is considered a leaf if none of its descendants appear in the
* `node_id_value_pairs`.
*/
[[nodiscard]] auto is_leaf_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I only have a rough understanding on how schema tree works.
This function is bit confusing for me.
"A node is considered a leaf if none of its descendants appear in the node_id_value_pairs." <- This doesn't seem to be the normal definition for a leaf node. Is this expected?

In addition, "the sub schema tree defined by node_id_value_pairs." Is there any guarantee that nodes in the node_id_value_pairs will form a subtree? Just looking at the argument without thinking about how it is generated, it is possible node_id_value_pairs contains a list of unconnected nodes.

And lastly a highlevel comment is that I feel this function should belong to the Schema tree class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a validation method: if an object type node appears in the key value pairs, it must be a leaf node. The way we store key value pairs is to store the schema (all as leaf nodes), so the path from the root to the leaves are implicitly indicated. In the merged schema tree, usually an object type node is a non leaf node. But in a key value pairs' schema, this node can be a leaf node as long as its value is {} or null. Our goal is to validate that if such a node appears, all its descendants in the merged schema tree can't appear in the schema of the given key value pair log event. it is possible node_id_value_pairs contains a list of unconnected nodes isn't true since everything will be connected to the root if they are valid leaf nodes. This function is used to validate pairs using schema tree's information. I think we should split it from a schema tree's method.

auto const& node{schema_tree.get_node(node_id)};
auto const node_type{node.get_type()};
if (false == value.has_value()) {
// Value is an empty object (`{}`, which is not the same as `null`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this comment tries to explain why the line doesn't use value.is_null(); in a way similar to line 68?
I don't think I follow the comment though. Can you clarify a bit?

And we can combine the two if statements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. {} and null are two different types of values.
  2. I'd prefer to handle them separately. Combining these two statements will make it very hard to read

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get when there will be {} and when there will be null (and why we only check for {} but not null in this function)
Is it because null is already handled in the validation? If so, I think you should leave this information in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} is considered as a special case, where null is a valid primitive value (like an int or a string). These are user-input values.

m_utc_offset{utc_offset} {}

// Variables
std::shared_ptr<SchemaTree> m_schema_tree;
Copy link
Contributor

@haiqi96 haiqi96 Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Techinically it is possible to modify the original schema tree using this pointer in the class, right?
Should we consider using std::shared_ptr<SchemaTree const> ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense

components/core/src/clp/ffi/KeyValuePairLogEvent.hpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs
) -> std::optional<std::errc> {
std::optional<std::errc> ret_val;
std::unordered_map<SchemaTreeNode::id_t, std::unordered_set<std::string_view>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into the try to reduce its scope.

* @param node_id_value_pairs
* @return `std::nullopt` if the inputs are valid, or an error code indicating the failure:
* - std::errc::operation_not_permitted if a node ID doesn't represent a valid node in the
* schema tree, or a non-leaf node ID is paired with a value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-leaf node paired with a value actually returns protocol_not_supported. That said, I think it should return operation_not_permitted since if it was permitted, node_id_value_pairs wouldn't be a tree.

components/core/tests/test-ffi_KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
}
}

auto clone_value(Value const& value) -> Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I missed this in the previous PRs but it seems like we should just have copy constructors for Value and EncodedTextAst. It feels like an overoptimization to force people to never use copy construction unless they write a method manually like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a copy constructor for EncodedTextAst, however, I'd prefer to not have a copy constructor for Value to enforce move semantics. How about making this clone method as a member of Value, so that people can explicitly copy a value when they really need

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's bad about copying a Value? Is it not similar to copying a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying string and EncodedTextAst can be expensive in deserialization. With copy disabled, we are more confident that there's no copy overhead in deserialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding move constructor for EncodedTextAst and Value

}

{
// Empty invalid_node_id_value_pairs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overzealous refactoring.

}

// NOLINTNEXTLINE(readability-function-cognitive-complexity)
TEST_CASE("ffi_KeyValuePairLogEvent_create", "[ffi]") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reduce the amount of code for this test by using SECTION and even nested SECTION macros. Recall that CMake runs every SECTION from the beginning of the test, so the schema tree, node_id_value_pairs would be reset for each outer SECTION.

E.g., instead of cloning valid_node_id_value_pairs, you could just create it outside a SECTION and use it inside different SECTIONs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about using SECTION but I didn't. Two reasons:

  1. I don't want to rerun everything from the beginning
  2. I want to test that the same schema tree is used across different log events (which have different life cycles)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. But by using sections, you'd only run the things outside the section(s) from the beginning, right?
  2. Testing the same instance of the schema tree across all of the tests seems a bit extra since it should be const in the class anyway (and if it wasn't const, you're still relying on finding a bug in the implementation with a set of test cases that isn't exhaustive). Imo, having less code in the test is more valuable than testing that corner case.

From my reading, you could do something like:

  1. Create schema tree
  2. Section (optional): Test empty id-value pairs
  3. Section (optional): Test mismatched types
  4. Section (optional): Test valid id-value pairs
    • Section: Test duplicate key conflicts
    • Section: Test implicit key conflicts
  5. Section (optional): Test out of bound node ID

That would allow you to at least get rid of the clone_node_id_value_pairs function (and make the code a bit easier to read).

LinZhihao-723 and others added 3 commits August 15, 2024 14:06
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
auto operator=(Value const&) -> Value& = delete;

// Default move constructor and assignment operator
// Default copy/move constructor and assignment operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since everything except the constructor is default, we don't need to say they're default explicitly, right?

auto operator=(EncodedTextAst const&) -> EncodedTextAst& = delete;

// Default move constructor and assignment operator
// Default copy/move constructor and assignment operator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto about default.

components/core/tests/test-ffi_KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp Outdated Show resolved Hide resolved
REQUIRE_FALSE(result.has_error());

SECTION("Test duplicated key conflict on node #3") {
auto conflict_pairs{valid_node_id_value_pairs};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the idea of using nested sections (besides breaking the code up) is that you wouldn't need to duplicate valid_node_id_value_pairs since it would start with a fresh state in each section, no? (You'd obviously call it something other than valid_node_id_value_pairs, but still.

LinZhihao-723 and others added 2 commits August 15, 2024 16:39
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
kirkrodrigues
kirkrodrigues previously approved these changes Aug 15, 2024
@LinZhihao-723 LinZhihao-723 merged commit 1b67fdb into y-scope:main Aug 15, 2024
12 of 13 checks passed
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants