Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add PrefixTree and RegisterHandler to support TDFA simulation. #56

Merged
merged 355 commits into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
355 commits
Select commit Hold shift + click to select a range
53ba56a
Remove unused using.
SharafMohamed Oct 23, 2024
8a677e3
Remove empty namespace.
SharafMohamed Oct 23, 2024
f17f752
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
cbe1d39
Make traversal_order const.
SharafMohamed Oct 23, 2024
77bf2e0
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
a9d0ef3
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
43ec3f0
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
45372df
Add missing using for std::move.
SharafMohamed Oct 23, 2024
d0ba724
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
f69aa86
Update src/log_surgeon/finite_automata/RegexNFA.hpp
SharafMohamed Oct 23, 2024
723eabb
Use move semantic for NFA constructor.
SharafMohamed Oct 23, 2024
8d40656
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 23, 2024
d20e391
Move add_to_queue_and_visited() to lambda.
SharafMohamed Oct 23, 2024
6a312e9
Fix compiler error in intersect-test.
SharafMohamed Oct 23, 2024
f8e5f8f
Simplify new_state().
SharafMohamed Oct 24, 2024
fc25f00
Remove using for std::move, and explicitly add namespace.
SharafMohamed Oct 24, 2024
cdab650
Update serialize docstring.
SharafMohamed Oct 24, 2024
e8db277
Have internal serialize() functions for RegexNFA (states and tagged t…
SharafMohamed Oct 24, 2024
337cead
Reserve space during BFS; Run linter.
SharafMohamed Oct 24, 2024
4a30fdc
Add braced initialization to nfa.
SharafMohamed Oct 27, 2024
0203038
Update docstring for positive tag serialization.
SharafMohamed Oct 27, 2024
633acc4
Update docstring for negative tag serialization.
SharafMohamed Oct 27, 2024
4db7b82
Use return statement for full docstring of get_bfs_traversal_order.
SharafMohamed Oct 27, 2024
01f8b14
Update NFA serialize() docstring.
SharafMohamed Oct 27, 2024
d047624
Add long form of BFS for first use.
SharafMohamed Oct 27, 2024
f9c4f46
Use const for state_id_it.
SharafMohamed Oct 27, 2024
bd77c78
Update docstring for NFA state serialize.
SharafMohamed Oct 27, 2024
f2d8049
Combine the two failure cases in NFA state serailize's docstring to m…
SharafMohamed Oct 27, 2024
4cb560f
Use const for state_id_it.
SharafMohamed Oct 27, 2024
95b7497
For NFA state serialize flip order of failure checks to reduce indent…
SharafMohamed Oct 27, 2024
e187445
Merge branch 'tagged-nfa-new' of https://github.com/SharafMohamed/log…
SharafMohamed Oct 27, 2024
8b85511
Use const& for passing rules into the NFA as rules are never stored, …
SharafMohamed Oct 28, 2024
0756794
Use braced initialization for NFA.
SharafMohamed Oct 28, 2024
6ab439a
Remove warning for not check std::optional when we know the function …
SharafMohamed Oct 28, 2024
9244812
Remove redundant initialzation of member variables in tagged transiti…
SharafMohamed Oct 28, 2024
0d151a4
Use member initialization lists for constructing NFA state from tagge…
SharafMohamed Oct 28, 2024
ac63713
Switch to using optional prefix for optional return types.
SharafMohamed Oct 28, 2024
b57b93f
Make negative tagged transition singular as you can never have more t…
SharafMohamed Oct 28, 2024
c3fb16d
Add missing param for new_state_with_negative_tagged_transitions.
SharafMohamed Oct 28, 2024
8a41367
Move RegexNFAStateType, RegexNFAState, and PositiveTaggedTransition/N…
SharafMohamed Oct 28, 2024
d1a57e4
Add tag class.
SharafMohamed Oct 28, 2024
bc78f59
Make tag an object with name, start, and end information, instead of …
SharafMohamed Oct 29, 2024
ac7260f
Run linter.
SharafMohamed Oct 29, 2024
40a8206
Merge branch 'main' into singular-negative-transition
SharafMohamed Oct 31, 2024
c2eea21
Change t to curr_state and u to dest_state.
SharafMohamed Oct 31, 2024
629fce9
Change curr_state to current_state; Remove extraneous *; Add newline …
SharafMohamed Oct 31, 2024
aed62b2
Add TODO for utf8 case in BFS.
SharafMohamed Oct 31, 2024
34522a7
Use auto and fix order of const wrt to *.
SharafMohamed Oct 31, 2024
332af35
Initialize m_dest_state to nullptr.
SharafMohamed Oct 31, 2024
748e794
Change negative_tagged_transition to negative_tagged_transition_string.
SharafMohamed Oct 31, 2024
38dc22b
Change negative tag transitions to singular.
SharafMohamed Oct 31, 2024
5a30ed8
Switch transitions to singular where applicable.
SharafMohamed Oct 31, 2024
c8bf9e6
Merge changes with previous PR manually. Still missing changes to pre…
SharafMohamed Oct 31, 2024
90edf77
Auto linter.
SharafMohamed Oct 31, 2024
fd765f7
Merge branch 'singular-negative-transition' into individual-files
SharafMohamed Oct 31, 2024
f7d3415
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Oct 31, 2024
b5f7cdf
Modify expected output where ordering of negative tags is ambiguous. …
SharafMohamed Oct 31, 2024
d90b731
Add a description for how to use the tag.
SharafMohamed Oct 31, 2024
3f1f8ff
Add start and end positive transitions.
SharafMohamed Oct 31, 2024
2bd5d2c
Add functionality to tags to use it for tracking capture positions; R…
SharafMohamed Oct 31, 2024
2d0157e
Reduce indentation of epsilon closure by using continue.
SharafMohamed Oct 31, 2024
1cabafd
Use optional for negative transitions in RegexNFAState.
SharafMohamed Oct 31, 2024
dc2c637
Add missing headers; Remove unused headers.
SharafMohamed Nov 1, 2024
7c5cfc0
Assign optional_negative_tagged_transition to a reference.
SharafMohamed Nov 1, 2024
4e8d290
Assign optional_negative_tagged_transition to a reference again.
SharafMohamed Nov 1, 2024
fde9037
Add <stack> to Lexer.tpp.
SharafMohamed Nov 1, 2024
e63637e
Fix comment grammar.
SharafMohamed Nov 1, 2024
08e7d5e
Update with previous PR.
SharafMohamed Nov 1, 2024
f7b5666
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Nov 1, 2024
93aebd5
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 1, 2024
b8c8f77
Store negative tags in a vector instead of set so that the order is d…
SharafMohamed Nov 1, 2024
ef95061
Sync with previous PR.
SharafMohamed Nov 2, 2024
b55e96c
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Nov 2, 2024
304f612
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 2, 2024
7cc8c52
Add start tags to NFA.
SharafMohamed Nov 2, 2024
b1a9300
Update unit-test to handle start transitions.
SharafMohamed Nov 2, 2024
9da470d
Merge branch 'main' into individual-files
SharafMohamed Nov 2, 2024
b451651
Move RegexNFAXState typedef into RegexNFAState.hpp
SharafMohamed Nov 6, 2024
f71348b
Switch void to auto -> void.
SharafMohamed Nov 6, 2024
21e80b9
Merge branch 'individual-files' of https://github.com/SharafMohamed/l…
SharafMohamed Nov 6, 2024
4576d7d
Move short functions into the class definition; Move RegexNFAXState t…
SharafMohamed Nov 6, 2024
6e24969
Merge branch 'individual-files' into meaningful-tags
SharafMohamed Nov 7, 2024
ff91bcc
Merge branch 'main' into meaningful-tags
SharafMohamed Nov 7, 2024
e786ec6
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 7, 2024
5abe906
Auto format.
SharafMohamed Nov 7, 2024
bb0bd2e
Remove unused lambda; Auto format.
SharafMohamed Nov 7, 2024
a36bb90
Add test case for Tag class.
SharafMohamed Nov 7, 2024
59cc6cd
Add nullptr checks.
SharafMohamed Nov 7, 2024
8097a69
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 7, 2024
9fc41c0
Change Tag class functionality to reflect how registers will be used.
SharafMohamed Nov 7, 2024
6e5c968
Add register class.
SharafMohamed Nov 7, 2024
d060bc6
Temp fix for unit-test until future PR where Tag ptrs are stored in v…
SharafMohamed Nov 11, 2024
f041a37
Swap from set to vector to tag pointers to ensure determinism.
SharafMohamed Nov 11, 2024
f72e120
Better test coverage for tag class.
SharafMohamed Nov 11, 2024
d5ac1ad
Use constant iterators for elements that should not change.
SharafMohamed Nov 12, 2024
30f03ed
Use braced intiailization in test-tag.cpp.
SharafMohamed Nov 12, 2024
d386fc0
Use const& for insertion function that can't use move semantics.
SharafMohamed Nov 12, 2024
4024c3e
Have get_name() return string_view; Update headers.
SharafMohamed Nov 13, 2024
22c3b82
Remove const from member variable.
SharafMohamed Nov 13, 2024
ed55534
Remove const from member variable.
SharafMohamed Nov 13, 2024
534afce
Run linter.
SharafMohamed Nov 13, 2024
61fdb5d
Add move semantic test cases.
SharafMohamed Nov 13, 2024
78e5fe8
Add PositiveTaggedTransition docstring and make m_tag throw if ever n…
SharafMohamed Nov 13, 2024
630d882
Delete unused operators.
SharafMohamed Nov 13, 2024
543f8af
Move null check into intiailizer list for NegativeTaggedTransition co…
SharafMohamed Nov 13, 2024
ec342fc
Remove position vectors from Tag, as they arent used in the AST.
SharafMohamed Nov 13, 2024
af86281
RegexASTCapture enforces non-null arguments; Add docstring to RegexAS…
SharafMohamed Nov 13, 2024
738becd
Capitalize exceptions.
SharafMohamed Nov 13, 2024
789263e
Use () to fix linting issue.
SharafMohamed Nov 13, 2024
1f15ca7
Keep default copy assignment.
SharafMohamed Nov 14, 2024
7688c24
Move @throw to constructor docstrings.
SharafMohamed Nov 14, 2024
27618b2
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 14, 2024
867d27c
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed Nov 14, 2024
486190a
Do string_viee comparisomn in lexer test.
SharafMohamed Nov 14, 2024
ac75909
Use string_view compares in tag tests.
SharafMohamed Nov 14, 2024
090f18c
Update headers in TaggedTransition.hpp.
SharafMohamed Nov 15, 2024
c7cfc10
Seperate copy and move constructor unit-tests.
SharafMohamed Nov 15, 2024
91b8b51
Use NOTE for class requirements.
SharafMohamed Nov 15, 2024
fcb1a76
Use NOTE for class requirements.
SharafMohamed Nov 15, 2024
9b09e19
Use NOTE for class requirements.
SharafMohamed Nov 15, 2024
2f712e6
Merge branch 'meaningful-tags' into fixed-tagged-nfa
SharafMohamed Nov 18, 2024
75aecc4
Update install-catch2.sh to compile catch2 with c++17.
SharafMohamed Nov 18, 2024
9302b94
Merge branch 'main' into fixed-tagged-nfa
SharafMohamed Nov 18, 2024
97caabb
Merge branch 'catch2-install-fix' into fixed-tagged-nfa
SharafMohamed Nov 18, 2024
507a7d3
Loop over end_transitions correctly.
SharafMohamed Nov 18, 2024
34c227b
Add TagPositions class.
SharafMohamed Nov 18, 2024
27c8560
Remove new class, going to add it later.
SharafMohamed Nov 18, 2024
86caa9b
Add const back in.
SharafMohamed Nov 18, 2024
338638e
Add more const back in.
SharafMohamed Nov 18, 2024
a742601
Add more const back in.
SharafMohamed Nov 18, 2024
d358713
Linter.
SharafMohamed Nov 18, 2024
43870ea
Add more const back in.
SharafMohamed Nov 18, 2024
b827a6c
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed Nov 18, 2024
f941607
Use `auto`.
SharafMohamed Nov 19, 2024
aad9eb3
Fix spacing.
SharafMohamed Nov 19, 2024
a801bf8
Add diagram for capture group NFA.
SharafMohamed Nov 19, 2024
08b7548
Add const for consitency with constructor.
SharafMohamed Nov 19, 2024
449133e
Update positive end transition to be optional instead of a vector.
SharafMohamed Nov 19, 2024
7b837bf
Rename new_state function correctly.
SharafMohamed Nov 19, 2024
f0eb56b
Update capture group AST state creation.
SharafMohamed Nov 19, 2024
a945915
Encapsulate new state for capture group.
SharafMohamed Nov 19, 2024
c757ded
Fix compiler error.
SharafMohamed Nov 19, 2024
2eb7477
Use singular for end transition getter function.
SharafMohamed Nov 20, 2024
08060ed
Void to auto -> void.
SharafMohamed Nov 20, 2024
0c2c1d1
Update new_capture_group_start_states to new_capture_group_states to …
SharafMohamed Nov 20, 2024
b0b951a
Linter.
SharafMohamed Nov 20, 2024
3c2a2ab
Update docstring for .
SharafMohamed Nov 20, 2024
98c5b95
Rename to new_start_and_end_states_with_positively_tagged_transitions.
SharafMohamed Nov 20, 2024
f59cf41
Rename to capture_X_state.
SharafMohamed Nov 20, 2024
85a2d69
Update docstring.
SharafMohamed Nov 20, 2024
4c602d4
Updated diagram to match vars used in code.
SharafMohamed Nov 20, 2024
2b01433
Rename vars to serialized_X.
SharafMohamed Nov 20, 2024
e37b29a
Run Linter.
SharafMohamed Nov 20, 2024
c5beca3
Fix typo.
SharafMohamed Nov 20, 2024
fe4a7b3
Update diagram for capture group NFA.
SharafMohamed Nov 20, 2024
8993088
Merge branch 'fixed-tagged-nfa' into register
SharafMohamed Nov 26, 2024
aaf720a
Merge branch 'main' into register
SharafMohamed Nov 26, 2024
0017512
Add register unit-tests, add PrefixTree with unit-tests.
SharafMohamed Nov 26, 2024
336f2ae
Finished with initial register implementation.
SharafMohamed Nov 26, 2024
3449df2
Linter.
SharafMohamed Nov 26, 2024
ef62df1
Linter.
SharafMohamed Nov 26, 2024
a085650
Docstring fixes.
SharafMohamed Nov 27, 2024
2be06c0
Add boundry test case.
SharafMohamed Nov 27, 2024
9ec01dd
Improve test cases for setting positions in prefix tree.
SharafMohamed Nov 27, 2024
019e675
Improve test cases for setting invalid positions in prefix tree.
SharafMohamed Nov 27, 2024
83a411a
Remove confusing description; Remove unused include.
SharafMohamed Nov 27, 2024
c88fbb5
Add edge case test to register unit-tests.
SharafMohamed Nov 27, 2024
7c91ddc
Update docstring for PrefixTreeNode.
SharafMohamed Nov 27, 2024
4c50769
Add comments to test-case; Add new test case for setting root value.
SharafMohamed Nov 27, 2024
98200b4
Update docstring to make it clear that any negative value of m_positi…
SharafMohamed Nov 27, 2024
afaf01a
Fix header gaurd.
SharafMohamed Nov 27, 2024
8dea476
Fix typo.
SharafMohamed Nov 27, 2024
dbb1e16
Remove newline in docstring.
SharafMohamed Nov 27, 2024
e054825
Improve throw consistency.
SharafMohamed Nov 27, 2024
792ce96
Update prefix tree insertion test cases.
SharafMohamed Nov 27, 2024
cab6e81
Fix test case.
SharafMohamed Nov 27, 2024
ffda5e6
Fix @throws doscstring for consistency; Improve insert() docstring.
SharafMohamed Nov 27, 2024
ff11672
Improve register handler test coverage.
SharafMohamed Nov 27, 2024
536b50b
Fix == ordering in test-cases; Fix vector initialization to remove re…
SharafMohamed Nov 27, 2024
77c20f7
Add const for consistency.
SharafMohamed Nov 27, 2024
f43759c
Add _HPP to header guards; Remove unused include.
SharafMohamed Nov 27, 2024
01e8881
Fix typo.
SharafMohamed Nov 27, 2024
fbb3d36
Remove blank line.
SharafMohamed Nov 27, 2024
e1f2b18
Rename to m_prefix_tree; Remove unused include.
SharafMohamed Nov 27, 2024
a51b49d
Add param descriptions to docstrings.
SharafMohamed Nov 27, 2024
002577e
Improve out of range check to be consistent.
SharafMohamed Nov 27, 2024
52a155c
Update set docstring.
SharafMohamed Nov 27, 2024
a6beafc
Punctuate docstrings.
SharafMohamed Nov 27, 2024
ec1f757
Update PregixTreeNode docstring.
SharafMohamed Nov 28, 2024
f35741f
Improve docstring for PrefixTree.
SharafMohamed Nov 28, 2024
e8e5e55
Change to use auto -> void; Punctuate out_of_range throws.
SharafMohamed Nov 28, 2024
f1ece30
Update Register docstring.
SharafMohamed Nov 28, 2024
08997ae
Update PrefixTree docstring.
SharafMohamed Nov 28, 2024
0910c62
Grammar fix.
SharafMohamed Nov 28, 2024
ede680e
Grammar fix.
SharafMohamed Nov 28, 2024
c7b047c
Use auto where possible.
SharafMohamed Nov 28, 2024
6fa8fcb
Use uniform initialization.
SharafMohamed Dec 2, 2024
18b9160
Add missing header.
SharafMohamed Dec 2, 2024
3f08fa3
Linter.
SharafMohamed Dec 2, 2024
e281f04
Fix spacing.
SharafMohamed Dec 2, 2024
a03734e
Make Node a member of PrefixTree.
SharafMohamed Dec 2, 2024
9123c7a
Rename index to prefix_tree_node_id.
SharafMohamed Dec 2, 2024
fe35fe0
Make it clear indicies in add_register are refering to prefix_tree no…
SharafMohamed Dec 3, 2024
de58e08
Linter.
SharafMohamed Dec 3, 2024
1426179
rename to reg_id.
SharafMohamed Dec 3, 2024
3301f14
Rename to reg_id.
SharafMohamed Dec 3, 2024
c9b1369
Use at().
SharafMohamed Dec 3, 2024
e2aee66
Remove Register class and use uint32_t instead; Rename vers to xxx_re…
SharafMohamed Dec 3, 2024
36c1810
Rename to reg_id.
SharafMohamed Dec 3, 2024
48df8b0
Remove unused header.
SharafMohamed Dec 3, 2024
a8605fc
Change pred index to be optional and nullopt for root.
SharafMohamed Dec 3, 2024
15cb1b6
Add and use node_id_t.
SharafMohamed Dec 3, 2024
6b787d0
Add position_t.
SharafMohamed Dec 3, 2024
cd8f4e3
Change to id_t.
SharafMohamed Dec 3, 2024
72da50c
Add is_root().
SharafMohamed Dec 4, 2024
3fc7ea7
Add missing header.
SharafMohamed Dec 4, 2024
6443d66
Update PrefixTree docstring.
SharafMohamed Dec 4, 2024
63aec4d
Removing node docstring as its redundant.
SharafMohamed Dec 4, 2024
295f3ee
Combine private section in PrefixTree.
SharafMohamed Dec 4, 2024
1186666
Add missing header; Remove copy paste error.
SharafMohamed Dec 4, 2024
06ee38e
Rename to node_id and parent_node_id.
SharafMohamed Dec 4, 2024
e103011
Update get_reversed_positions' docstring.
SharafMohamed Dec 4, 2024
31b0346
Update get_reversed positions' docstring to clarify exlcusivity of th…
SharafMohamed Dec 4, 2024
4005e41
Grammar fix.
SharafMohamed Dec 4, 2024
e38940c
Add maybe_unusued.
SharafMohamed Dec 4, 2024
d71368d
Update src/log_surgeon/finite_automata/RegisterHandler.hpp
SharafMohamed Dec 4, 2024
dd4b6e1
Update test case names to document code names better.
SharafMohamed Dec 4, 2024
7322852
Implicitily use auto in vectors.
SharafMohamed Dec 4, 2024
dba1a18
Explicitily use position_t for vectors.
SharafMohamed Dec 4, 2024
ee6efab
Update tests/test-register-handler.cpp
SharafMohamed Dec 4, 2024
9ba980c
Switch to size_t.
SharafMohamed Dec 4, 2024
27b324c
Clang-tidy: Remove magic numbers + Fix headers.
SharafMohamed Dec 4, 2024
f651a24
Reduce complexity for clang-tidy.
SharafMohamed Dec 4, 2024
fc6f426
Add negative pos test case in test-register-handler.cpp.
SharafMohamed Dec 4, 2024
c8fb570
Alternate b/w positive and negative positions in test-prefix-tree neg…
SharafMohamed Dec 4, 2024
1f66918
Add cRootId and size() to PrefixTree.
SharafMohamed Dec 4, 2024
a388c80
Update note.
SharafMohamed Dec 4, 2024
340eaf7
Update docstring.
SharafMohamed Dec 4, 2024
22cf931
Fix typo.
SharafMohamed Dec 4, 2024
c61f2d9
Update header for size_t.
SharafMohamed Dec 5, 2024
417bde8
Update src/log_surgeon/finite_automata/PrefixTree.hpp
SharafMohamed Dec 5, 2024
738876d
Update src/log_surgeon/finite_automata/PrefixTree.hpp
SharafMohamed Dec 5, 2024
93c03a0
Update src/log_surgeon/finite_automata/RegisterHandler.hpp
SharafMohamed Dec 5, 2024
6481e5f
Update tests/test-prefix-tree.cpp
SharafMohamed Dec 5, 2024
6a9a4a4
Clean up register initialization helper; Fix typo.
SharafMohamed Dec 5, 2024
052d86f
Update get_parent_id to clarify its unsafe and suppress warning.
SharafMohamed Dec 5, 2024
ed70bd5
Move constants in test-register-handler.hpp to minimize scope.
SharafMohamed Dec 5, 2024
1671e39
Move constants into scope for test-prefix-tree.cpp.
SharafMohamed Dec 5, 2024
748dfc5
Rename to handler_init and return handler.
SharafMohamed Dec 5, 2024
8abf35a
Add docstring for get_parent_id_unsafe().
SharafMohamed Dec 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -93,12 +93,15 @@ set(SOURCE_FILES
src/log_surgeon/SchemaParser.hpp
src/log_surgeon/Token.cpp
src/log_surgeon/Token.hpp
src/log_surgeon/finite_automata/PrefixTree.cpp
src/log_surgeon/finite_automata/PrefixTree.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
src/log_surgeon/finite_automata/RegexDFA.hpp
src/log_surgeon/finite_automata/RegexDFA.tpp
src/log_surgeon/finite_automata/RegexNFA.hpp
src/log_surgeon/finite_automata/RegexNFAState.hpp
src/log_surgeon/finite_automata/RegexNFAStateType.hpp
src/log_surgeon/finite_automata/RegisterHandler.hpp
src/log_surgeon/finite_automata/Tag.hpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/finite_automata/UnicodeIntervalTree.hpp
Expand Down
20 changes: 20 additions & 0 deletions src/log_surgeon/finite_automata/PrefixTree.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#include "PrefixTree.hpp"

SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
#include <stdexcept>
#include <vector>

namespace log_surgeon::finite_automata {
auto PrefixTree::get_reversed_positions(id_t const node_id) const -> std::vector<position_t> {
if (m_nodes.size() <= node_id) {
throw std::out_of_range("Prefix tree index out of range.");
}

std::vector<position_t> reversed_positions;
auto current_node{m_nodes[node_id]};
while (false == current_node.is_root()) {
reversed_positions.push_back(current_node.get_position());
current_node = m_nodes[current_node.get_parent_node_id().value()];
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
}
return reversed_positions;
}
} // namespace log_surgeon::finite_automata
85 changes: 85 additions & 0 deletions src/log_surgeon/finite_automata/PrefixTree.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
#ifndef LOG_SURGEON_FINITE_AUTOMATA_PREFIX_TREE_HPP
#define LOG_SURGEON_FINITE_AUTOMATA_PREFIX_TREE_HPP

#include <cstdint>
#include <optional>
#include <stdexcept>
#include <vector>

namespace log_surgeon::finite_automata {
/**
* Represents a prefix tree to store register data during TDFA simulation. Each node in the tree
* stores a single posiiton in the lexed string. Each path from the root to an index corresponds to
* a sequence of positions for an individual tag:
* - Positive position node: Indicates the tag was matched at the position.
* - Negative position node: Indicates the tag was unmatched. If a negative node is the entire path,
* it indicates the tag was never matched. If the negative tag is along a path containing positive
* nodes, it functions as a placeholder. This can be useful for nested capture groups, to maintain a
* one-to-one mapping between the contained capture group and the enclosing capture group.
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
*/
class PrefixTree {
public:
using id_t = uint32_t;
using position_t = int32_t;

static constexpr id_t cRootId{0};

PrefixTree() : m_nodes{{std::nullopt, -1}} {}

/**
* @param parent_node_id Index of the inserted node's parent in the prefix tree.
* @param position The position in the lexed string.
* @return The index of the newly inserted node in the tree.
* @throw std::out_of_range if the parent's index is out of range.
*/
[[maybe_unused]] auto insert(id_t const parent_node_id, position_t const position) -> id_t {
if (m_nodes.size() <= parent_node_id) {
throw std::out_of_range("Predecessor index out of range.");
}

m_nodes.emplace_back(parent_node_id, position);
return m_nodes.size() - 1;
}

auto set(id_t const node_id, position_t const position) -> void {
m_nodes.at(node_id).set_position(position);
}

[[nodiscard]] auto size() const -> size_t { return m_nodes.size(); }
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved

/**
* @param node_id The index of the node.
* @return A vector containing positions along the path defined by `node_id`, in reverse order,
* i.e., [index, root).
* @throw std::out_of_range if the index is out of range.
*/
[[nodiscard]] auto get_reversed_positions(id_t node_id) const -> std::vector<position_t>;

private:
class Node {
public:
Node(std::optional<id_t> const parent_node_id, position_t const position)
: m_parent_node_id{parent_node_id},
m_position{position} {}

[[nodiscard]] auto is_root() const -> bool { return false == m_parent_node_id.has_value(); }

[[nodiscard]] auto get_parent_node_id() const -> std::optional<id_t> {
return m_parent_node_id;
}

auto set_position(position_t const position) -> void { m_position = position; }

[[nodiscard]] auto get_position() const -> position_t { return m_position; }

private:
std::optional<id_t> m_parent_node_id;
position_t m_position;
};

std::vector<Node> m_nodes;
};

SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
} // namespace log_surgeon::finite_automata

#endif // LOG_SURGEON_FINITE_AUTOMATA_PREFIX_TREE_HPP
52 changes: 52 additions & 0 deletions src/log_surgeon/finite_automata/RegisterHandler.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#ifndef LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP
#define LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP

#include <cstddef>
#include <vector>

#include <log_surgeon/finite_automata/PrefixTree.hpp>

namespace log_surgeon::finite_automata {
/**
* The register handler maintains a prefix tree that is sufficient to represent all registers.
* The register handler also contains a vector of registers, and performs the set, copy, and append
* operations for these registers.
*
* Note: for efficiency these registers may be re-used, but are not required to be re-initialized.
* It is the responsibility of the DFA to set the register value when needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline and also in this chat: #56 (comment), are we planning to implement this optimization in the coming PRs soon? Looking at set again, it's just reusing a register and set it's value to a new position, but the history is unchanged. Without seeing the implementation, it's not intuitive why we need this API. For this PR, do u think it's better to not have this API but keep every register not reusable for SSA? We can add this API later if we have a concrete use-case, and then we have more context to evaluate whether the API makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the docstring to clarify the meaning, and make it more specific to the current usage. Either way the TDFA is responsible for setting the register and we should never initialize. Whether we are talking about a register pool being used in different capture groups, or a single register being used to parse multiple log messages. After the register is used to lex the first string, and is reused to lex a second string, it should not be re-initialized, as negative tags will handle the cases where it does need to be set to -1 (unmatched).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's update the docstrings to clarify the motivations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

 * NOTE: For efficiency, registers are not initialized when lexing a new string; instead, it is the
 * responsibility of the DFA to set the register values when needed.

*/
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
class RegisterHandler {
public:
auto add_register(
PrefixTree::id_t const prefix_tree_parent_node_id,
PrefixTree::position_t const position
) -> void {
auto const prefix_tree_node_id{m_prefix_tree.insert(prefix_tree_parent_node_id, position)};
m_registers.emplace_back(prefix_tree_node_id);
}

auto set_register(size_t const reg_id, PrefixTree::position_t const position) -> void {
m_prefix_tree.set(m_registers.at(reg_id), position);
}

auto copy_register(size_t const dest_reg_id, size_t const source_reg_id) -> void {
m_registers.at(dest_reg_id) = m_registers.at(source_reg_id);
}

auto append_position(size_t const reg_id, PrefixTree::position_t const position) -> void {
auto const node_id{m_registers.at(reg_id)};
m_registers.at(reg_id) = m_prefix_tree.insert(node_id, position);
}

[[nodiscard]] auto get_reversed_positions(size_t const reg_id
) const -> std::vector<PrefixTree::position_t> {
return m_prefix_tree.get_reversed_positions(m_registers.at(reg_id));
}

private:
PrefixTree m_prefix_tree;
std::vector<PrefixTree::id_t> m_registers;
};
} // namespace log_surgeon::finite_automata

#endif // LOG_SURGEON_FINITE_AUTOMATA_REGISTER_HANDLER_HPP
5 changes: 4 additions & 1 deletion tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,13 @@ set(
SOURCES_LOG_SURGEON
../src/log_surgeon/FileReader.cpp
../src/log_surgeon/FileReader.hpp
../src/log_surgeon/finite_automata/PrefixTree.cpp
../src/log_surgeon/finite_automata/PrefixTree.hpp
../src/log_surgeon/finite_automata/RegexAST.hpp
../src/log_surgeon/finite_automata/RegexNFA.hpp
../src/log_surgeon/finite_automata/RegexNFAState.hpp
../src/log_surgeon/finite_automata/RegexNFAStateType.hpp
../src/log_surgeon/finite_automata/RegisterHandler.hpp
../src/log_surgeon/finite_automata/Tag.hpp
../src/log_surgeon/finite_automata/TaggedTransition.hpp
../src/log_surgeon/LALR1Parser.cpp
Expand All @@ -21,7 +24,7 @@ set(
../src/log_surgeon/Token.hpp
)

set(SOURCES_TESTS test-lexer.cpp test-NFA.cpp test-tag.cpp)
set(SOURCES_TESTS test-lexer.cpp test-NFA.cpp test-prefix-tree.cpp test-register-handler.cpp test-tag.cpp)

add_executable(unit-test ${SOURCES_LOG_SURGEON} ${SOURCES_TESTS})
target_link_libraries(unit-test PRIVATE Catch2::Catch2WithMain log_surgeon::log_surgeon)
Expand Down
117 changes: 117 additions & 0 deletions tests/test-prefix-tree.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#include <limits>
#include <stdexcept>
#include <vector>

#include <catch2/catch_test_macros.hpp>

#include <log_surgeon/finite_automata/PrefixTree.hpp>

using log_surgeon::finite_automata::PrefixTree;
using id_t = PrefixTree::id_t;
using position_t = PrefixTree::position_t;

constexpr auto cRootId{PrefixTree::cRootId};
constexpr id_t cInvaidNodeId{100};
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved
constexpr position_t cInsertPos1{4};
constexpr position_t cInsertPos2{7};
constexpr position_t cInsertPos3{9};
constexpr position_t cMaxPos{std::numeric_limits<position_t>::max()};
constexpr position_t cNegativePos1{-1};
constexpr position_t cNegativePos2{-100};
constexpr position_t cSetPos1{10};
constexpr position_t cSetPos2{12};
constexpr position_t cSetPos3{15};
constexpr position_t cSetPos4{20};
constexpr position_t cTreeSize1{4};
constexpr position_t cTreeSize2{8};
SharafMohamed marked this conversation as resolved.
Show resolved Hide resolved

TEST_CASE("`PrefixTree` operations", "[PrefixTree]") {
SECTION("Newly constructed tree works correctly") {
PrefixTree const tree;

// A newly constructed tree should return no positions as the root node is ignored
REQUIRE(tree.get_reversed_positions(cRootId).empty());
}

SECTION("Inserting nodes into the prefix tree works correctly") {
PrefixTree tree;

// Test basic insertions
auto const node_id_1{tree.insert(cRootId, cInsertPos1)};
auto const node_id_2{tree.insert(node_id_1, cInsertPos2)};
auto const node_id_3{tree.insert(node_id_2, cInsertPos3)};
REQUIRE(std::vector<position_t>{cInsertPos1} == tree.get_reversed_positions(node_id_1));
REQUIRE(std::vector<position_t>{cInsertPos2, cInsertPos1}
== tree.get_reversed_positions(node_id_2));
REQUIRE(std::vector<position_t>{cInsertPos3, cInsertPos2, cInsertPos1}
== tree.get_reversed_positions(node_id_3));
REQUIRE(cTreeSize1 == tree.size());

// Test insertion with large position values
auto const node_id_4{tree.insert(cRootId, cMaxPos)};
REQUIRE(cMaxPos == tree.get_reversed_positions(node_id_4)[0]);

// Test insertion with negative position values
auto const node_id_5{tree.insert(cRootId, cNegativePos1)};
auto const node_id_6{tree.insert(node_id_5, cInsertPos1)};
auto const node_id_7{tree.insert(node_id_6, cNegativePos2)};
REQUIRE(std::vector<position_t>{cNegativePos1} == tree.get_reversed_positions(node_id_5));
REQUIRE(std::vector<position_t>{cInsertPos1, cNegativePos1}
== tree.get_reversed_positions(node_id_6));
REQUIRE(std::vector<position_t>{cNegativePos2, cInsertPos1, cNegativePos1}
== tree.get_reversed_positions(node_id_7));
REQUIRE(cTreeSize2 == tree.size());
}

SECTION("Invalid index access throws correctly") {
PrefixTree tree;
REQUIRE_THROWS_AS(tree.get_reversed_positions(tree.size()), std::out_of_range);

tree.insert(cRootId, cInsertPos1);
REQUIRE_THROWS_AS(tree.get_reversed_positions(tree.size()), std::out_of_range);

REQUIRE_THROWS_AS(
tree.get_reversed_positions(std::numeric_limits<id_t>::max()),
std::out_of_range
);
}

SECTION("Set position for a valid index works correctly") {
PrefixTree tree;
// Test that you can set the root node for sanity, although this value is not used
tree.set(cRootId, cSetPos1);

// Test updates to different nodes
auto const node_id_1{tree.insert(cRootId, cInsertPos1)};
auto const node_id_2{tree.insert(node_id_1, cInsertPos1)};
tree.set(node_id_1, cSetPos1);
tree.set(node_id_2, cSetPos2);
REQUIRE(std::vector<position_t>{cSetPos1} == tree.get_reversed_positions(node_id_1));
REQUIRE(std::vector<position_t>{cSetPos2, cSetPos1}
== tree.get_reversed_positions(node_id_2));

// Test multiple updates to the same node
tree.set(node_id_2, cSetPos3);
tree.set(node_id_2, cSetPos4);
REQUIRE(std::vector<position_t>{cSetPos4, cSetPos1}
== tree.get_reversed_positions(node_id_2));

// Test that updates don't affect unrelated paths
auto const node_id_3{tree.insert(cRootId, cSetPos2)};
tree.set(node_id_3, cSetPos3);
REQUIRE(std::vector<position_t>{cSetPos1} == tree.get_reversed_positions(node_id_1));
REQUIRE(std::vector<position_t>{cSetPos4, cSetPos1}
== tree.get_reversed_positions(node_id_2));
}

SECTION("Set position for an invalid index throws correctly") {
PrefixTree tree;

// Test setting position before any insertions
REQUIRE_THROWS_AS(tree.set(cInvaidNodeId, cSetPos4), std::out_of_range);

// Test setting position just beyond valid range
auto const node_id_1{tree.insert(cRootId, cInsertPos1)};
REQUIRE_THROWS_AS(tree.set(node_id_1 + 1, cSetPos4), std::out_of_range);
}
}
Loading
Loading