Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi: Add class for key-value pair log events. #507

Merged
merged 29 commits into from
Aug 15, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
0da3f44
Implement CLP string
LinZhihao-723 Jul 26, 2024
141f954
Remove unused line
LinZhihao-723 Jul 26, 2024
f3525aa
Add value template file
LinZhihao-723 Jul 27, 2024
bbd3a56
Implement more helper templates...
LinZhihao-723 Jul 27, 2024
b1fa61c
Implement the super type
LinZhihao-723 Jul 28, 2024
e148af7
Merge from main
LinZhihao-723 Jul 30, 2024
f41e5d6
Add unit tests
LinZhihao-723 Jul 31, 2024
d29a17a
Fix cmake
LinZhihao-723 Jul 31, 2024
12679f7
... class
LinZhihao-723 Aug 3, 2024
534abf0
Implement kv pair class
LinZhihao-723 Aug 3, 2024
b7bed71
linting...
LinZhihao-723 Aug 3, 2024
88aa36a
Update KeyValuePair implementation
LinZhihao-723 Aug 7, 2024
92c3388
Merge
LinZhihao-723 Aug 8, 2024
698b406
Add missing comments
LinZhihao-723 Aug 8, 2024
bc565f5
Update doc string
LinZhihao-723 Aug 8, 2024
2393038
Update comments
LinZhihao-723 Aug 9, 2024
285aab3
Apply suggestions from code review
LinZhihao-723 Aug 14, 2024
7623d76
Apply code review comments and add unit tests
LinZhihao-723 Aug 14, 2024
ef0dd80
Reorder a little bit...
LinZhihao-723 Aug 14, 2024
54ec36a
Fix macos build
LinZhihao-723 Aug 14, 2024
941fcbc
Use const shared ptr and rely on type conversion
LinZhihao-723 Aug 14, 2024
6623b5a
Apply suggestions from code review
LinZhihao-723 Aug 15, 2024
3ae389a
Apply review comments to the src file
LinZhihao-723 Aug 15, 2024
a066c67
Apply comments on the unit tests
LinZhihao-723 Aug 15, 2024
353dccd
Apply suggestions from code review
LinZhihao-723 Aug 15, 2024
09dd01a
Apply code review comments
LinZhihao-723 Aug 15, 2024
83ae4e0
Remove default declaration since they are not implicitly declared alr…
LinZhihao-723 Aug 15, 2024
7cd1ea7
Rename
LinZhihao-723 Aug 15, 2024
6dd95a6
Fix the error
LinZhihao-723 Aug 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion components/core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,8 @@ set(SOURCE_FILES_unitTest
src/clp/ffi/ir_stream/Serializer.hpp
src/clp/ffi/ir_stream/utils.cpp
src/clp/ffi/ir_stream/utils.hpp
src/clp/ffi/KeyValuePairLogEvent.cpp
src/clp/ffi/KeyValuePairLogEvent.hpp
src/clp/ffi/SchemaTree.cpp
src/clp/ffi/SchemaTree.hpp
src/clp/ffi/SchemaTreeNode.hpp
Expand Down Expand Up @@ -475,8 +477,8 @@ set(SOURCE_FILES_unitTest
tests/test-BufferedFileReader.cpp
tests/test-EncodedVariableInterpreter.cpp
tests/test-encoding_methods.cpp
tests/test-ffi_KeyValuePairLogEvent.cpp
tests/test-ffi_SchemaTree.cpp
tests/test-ffi_Value.cpp
tests/test-Grep.cpp
tests/test-hash_utils.cpp
tests/test-ir_encoding_methods.cpp
Expand Down
167 changes: 167 additions & 0 deletions components/core/src/clp/ffi/KeyValuePairLogEvent.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
#include "KeyValuePairLogEvent.hpp"

#include <memory>
#include <string>
#include <string_view>
#include <system_error>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>

#include <outcome/single-header/outcome.hpp>

#include "../ir/EncodedTextAst.hpp"
#include "../time_types.hpp"
#include "SchemaTree.hpp"
#include "SchemaTreeNode.hpp"
#include "Value.hpp"

using clp::ir::EightByteEncodedTextAst;
using clp::ir::FourByteEncodedTextAst;
using std::string;

namespace clp::ffi {
namespace {
/**
* @param type
* @param value
* @return Whether the given schema tree node type matches the given value's type.
*/
[[nodiscard]] auto
node_type_matches_value_type(SchemaTreeNode::Type type, Value const& value) -> bool;

/**
* Validates whether the given node-ID value pairs are leaf nodes in the `SchemaTree` forming a
* sub-tree of their own.
* @param schema_tree
* @param node_id_value_pairs
* @return success if the inputs are valid, or an error code indicating the failure:
* - std::errc::operation_not_permitted if a node ID doesn't represent a valid node in the
* schema tree, or a non-leaf node ID is paired with a value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-leaf node paired with a value actually returns protocol_not_supported. That said, I think it should return operation_not_permitted since if it was permitted, node_id_value_pairs wouldn't be a tree.

* - std::errc::protocol_error if the schema tree node type doesn't match the value's type.
* - std::errc::protocol_not_supported if the same key appears more than once under a parent
* node.
*/
[[nodiscard]] auto validate_node_id_value_pairs(
SchemaTree const& schema_tree,
KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs
) -> std::errc;

/**
* @param schema_tree
* @param node_id
* @param node_id_value_pairs
* @return Whether the given node is a leaf node in the sub-tree of the `SchemaTree` defined by
* `node_id_value_pairs`. A node is considered a leaf if none of its descendants appear in
* `node_id_value_pairs`.
*/
[[nodiscard]] auto is_leaf_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I only have a rough understanding on how schema tree works.
This function is bit confusing for me.
"A node is considered a leaf if none of its descendants appear in the node_id_value_pairs." <- This doesn't seem to be the normal definition for a leaf node. Is this expected?

In addition, "the sub schema tree defined by node_id_value_pairs." Is there any guarantee that nodes in the node_id_value_pairs will form a subtree? Just looking at the argument without thinking about how it is generated, it is possible node_id_value_pairs contains a list of unconnected nodes.

And lastly a highlevel comment is that I feel this function should belong to the Schema tree class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a validation method: if an object type node appears in the key value pairs, it must be a leaf node. The way we store key value pairs is to store the schema (all as leaf nodes), so the path from the root to the leaves are implicitly indicated. In the merged schema tree, usually an object type node is a non leaf node. But in a key value pairs' schema, this node can be a leaf node as long as its value is {} or null. Our goal is to validate that if such a node appears, all its descendants in the merged schema tree can't appear in the schema of the given key value pair log event. it is possible node_id_value_pairs contains a list of unconnected nodes isn't true since everything will be connected to the root if they are valid leaf nodes. This function is used to validate pairs using schema tree's information. I think we should split it from a schema tree's method.

SchemaTree const& schema_tree,
SchemaTreeNode::id_t node_id,
KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs
) -> bool;

auto node_type_matches_value_type(SchemaTreeNode::Type type, Value const& value) -> bool {
switch (type) {
case SchemaTreeNode::Type::Obj:
return value.is_null();
case SchemaTreeNode::Type::Int:
return value.is<value_int_t>();
case SchemaTreeNode::Type::Float:
return value.is<value_float_t>();
case SchemaTreeNode::Type::Bool:
return value.is<value_bool_t>();
case SchemaTreeNode::Type::UnstructuredArray:
return value.is<FourByteEncodedTextAst>() || value.is<EightByteEncodedTextAst>();
case SchemaTreeNode::Type::Str:
return value.is<string>() || value.is<FourByteEncodedTextAst>()
|| value.is<EightByteEncodedTextAst>();
default:
return false;
}
}

auto validate_node_id_value_pairs(
SchemaTree const& schema_tree,
KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs
) -> std::errc {
try {
std::unordered_map<SchemaTreeNode::id_t, std::unordered_set<std::string_view>>
parent_node_id_to_key_names;
for (auto const& [node_id, value] : node_id_value_pairs) {
if (SchemaTree::cRootId == node_id) {
return std::errc::operation_not_permitted;
}

auto const& node{schema_tree.get_node(node_id)};
auto const node_type{node.get_type()};
if (false == value.has_value()) {
// Value is an empty object (`{}`, which is not the same as `null`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this comment tries to explain why the line doesn't use value.is_null(); in a way similar to line 68?
I don't think I follow the comment though. Can you clarify a bit?

And we can combine the two if statements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. {} and null are two different types of values.
  2. I'd prefer to handle them separately. Combining these two statements will make it very hard to read

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't get when there will be {} and when there will be null (and why we only check for {} but not null in this function)
Is it because null is already handled in the validation? If so, I think you should leave this information in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{} is considered as a special case, where null is a valid primitive value (like an int or a string). These are user-input values.

if (SchemaTreeNode::Type::Obj != node_type) {
return std::errc::protocol_error;
}
} else if (false == node_type_matches_value_type(node_type, value.value())) {
return std::errc::protocol_error;
}

if (SchemaTreeNode::Type::Obj == node_type
&& false == is_leaf_node(schema_tree, node_id, node_id_value_pairs))
{
// The node's value is `null` or `{}` but its descendants appear in
// `node_id_value_pairs`.
return std::errc::operation_not_permitted;
}

auto const parent_node_id{node.get_parent_id()};
auto const key_name{node.get_key_name()};
if (parent_node_id_to_key_names.contains(parent_node_id)) {
if (parent_node_id_to_key_names.at(parent_node_id).contains(key_name)) {
// The key is duplicated under the same parent
return std::errc::protocol_not_supported;
}
} else {
parent_node_id_to_key_names.emplace(parent_node_id, std::unordered_set{key_name});
}
}
} catch (SchemaTree::OperationFailed const& ex) {
return std::errc::operation_not_permitted;
}
return std::errc{};
}

auto is_leaf_node(
SchemaTree const& schema_tree,
SchemaTreeNode::id_t node_id,
KeyValuePairLogEvent::NodeIdValuePairs const& node_id_value_pairs
) -> bool {
std::vector<SchemaTreeNode::id_t> dfs_stack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using a deque?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR, you will see that I'm actually using std::stack (which is implemented using deque) for dfs traversal, which is because the stack element is more complicated and using a vector means I need to provide move/copy semantics.
However, in this scenario, the stack is simply an int. I think the guideline of using std container is you should always try to use vector unless you have a strong motivation to use other ones. In this particular function, using vector have better locality, fewer indirections, and less memory allocation. Therefore, I would prefer to stick to vector unless you have other concern 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, was just asking.

dfs_stack.reserve(schema_tree.get_size());
dfs_stack.push_back(node_id);
while (false == dfs_stack.empty()) {
auto const curr_node_id{dfs_stack.back()};
dfs_stack.pop_back();
for (auto const child_node_id : schema_tree.get_node(curr_node_id).get_children_ids()) {
if (node_id_value_pairs.contains(child_node_id)) {
return false;
}
dfs_stack.push_back(child_node_id);
}
}
return true;
}
} // namespace

auto KeyValuePairLogEvent::create(
std::shared_ptr<SchemaTree const> schema_tree,
NodeIdValuePairs node_id_value_pairs,
UtcOffset utc_offset
) -> OUTCOME_V2_NAMESPACE::std_result<KeyValuePairLogEvent> {
if (auto const ret_val{validate_node_id_value_pairs(*schema_tree, node_id_value_pairs)};
std::errc{} != ret_val)
{
return ret_val;
}
return KeyValuePairLogEvent{std::move(schema_tree), std::move(node_id_value_pairs), utc_offset};
}
} // namespace clp::ffi
81 changes: 81 additions & 0 deletions components/core/src/clp/ffi/KeyValuePairLogEvent.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#ifndef CLP_FFI_KEYVALUEPAIRLOGEVENT_HPP
#define CLP_FFI_KEYVALUEPAIRLOGEVENT_HPP

#include <memory>
#include <optional>
#include <unordered_map>
#include <utility>

#include <outcome/single-header/outcome.hpp>

#include "../time_types.hpp"
#include "SchemaTree.hpp"
#include "SchemaTreeNode.hpp"
#include "Value.hpp"

namespace clp::ffi {
/**
* A log event containing key-value pairs. Each event contains:
* - A collection of node-ID & value pairs, where each pair represents a leaf `SchemaTreeNode` in
* the `SchemaTree`.
* - A reference to the `SchemaTree`
* - The UTC offset of the current log event
*/
class KeyValuePairLogEvent {
public:
// Types
using NodeIdValuePairs = std::unordered_map<SchemaTreeNode::id_t, std::optional<Value>>;

// Factory functions
/**
* @param schema_tree
* @param node_id_value_pairs
* @param utc_offset
* @return A result containing the key-value pair log event or an error code indicating the
* failure. See `valdiate_node_id_value_pairs` for the possible error codes.
*/
[[nodiscard]] static auto create(
std::shared_ptr<SchemaTree const> schema_tree,
NodeIdValuePairs node_id_value_pairs,
UtcOffset utc_offset
) -> OUTCOME_V2_NAMESPACE::std_result<KeyValuePairLogEvent>;

// Disable copy constructor and assignment operator
KeyValuePairLogEvent(KeyValuePairLogEvent const&) = delete;
auto operator=(KeyValuePairLogEvent const&) -> KeyValuePairLogEvent& = delete;

// Default move constructor and assignment operator
KeyValuePairLogEvent(KeyValuePairLogEvent&&) = default;
auto operator=(KeyValuePairLogEvent&&) -> KeyValuePairLogEvent& = default;

// Destructor
~KeyValuePairLogEvent() = default;
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved

// Methods
[[nodiscard]] auto get_schema_tree() const -> SchemaTree const& { return *m_schema_tree; }

[[nodiscard]] auto get_node_id_value_pairs() const -> NodeIdValuePairs const& {
return m_node_id_value_pairs;
}

[[nodiscard]] auto get_utc_offset() const -> UtcOffset { return m_utc_offset; }

private:
// Constructor
KeyValuePairLogEvent(
haiqi96 marked this conversation as resolved.
Show resolved Hide resolved
std::shared_ptr<SchemaTree const> schema_tree,
NodeIdValuePairs node_id_value_pairs,
UtcOffset utc_offset
)
: m_schema_tree{std::move(schema_tree)},
m_node_id_value_pairs{std::move(node_id_value_pairs)},
m_utc_offset{utc_offset} {}

// Variables
std::shared_ptr<SchemaTree const> m_schema_tree;
NodeIdValuePairs m_node_id_value_pairs;
UtcOffset m_utc_offset{0};
};
} // namespace clp::ffi

#endif // CLP_FFI_KEYVALUEPAIRLOGEVENT_HPP
11 changes: 0 additions & 11 deletions components/core/src/clp/ffi/Value.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -150,17 +150,6 @@ class Value {
template <FundamentalPrimitiveValueType T>
explicit Value(T value) : m_value{value} {}

// Disable copy constructor and assignment operator
Value(Value const&) = delete;
auto operator=(Value const&) -> Value& = delete;

// Default move constructor and assignment operator
Value(Value&&) = default;
auto operator=(Value&&) -> Value& = default;

// Destructor
~Value() = default;

// Methods
/**
* @tparam T
Expand Down
11 changes: 0 additions & 11 deletions components/core/src/clp/ir/EncodedTextAst.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,6 @@ class EncodedTextAst {
m_dict_vars{std::move(dict_vars)},
m_encoded_vars{std::move(encoded_vars)} {}

// Disable copy constructor and assignment operator
EncodedTextAst(EncodedTextAst const&) = delete;
auto operator=(EncodedTextAst const&) -> EncodedTextAst& = delete;

// Default move constructor and assignment operator
EncodedTextAst(EncodedTextAst&&) = default;
auto operator=(EncodedTextAst&&) -> EncodedTextAst& = default;

// Destructor
~EncodedTextAst() = default;

// Methods
auto operator==(EncodedTextAst const&) const -> bool = default;

Expand Down
Loading
Loading