-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Update NFA, DFA, and LALR1 to align with the latest coding guidelines. #59
refactor: Update NFA, DFA, and LALR1 to align with the latest coding guidelines. #59
Conversation
…eterministic. This is fine as tags are unique across disjoint subtrees.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
…og-surgeon into individual-files
…ypedefs to the top of the file to fix compilation error.
…pp files; Also remove RegexDFA.tpp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any header guard issue within this file should be tracked in #65
Just to clarify: we don't necessarily do this in the current PR, but it's some design decision we should make |
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
I thought one of the other PRs already posted combine these into a single type, but it seems that's not the case. I'll add a PR to combine the types. |
Do u think we can do it in this PR? 'cuz reviewing the current renaming changes and then reviewing another PR that has a lot of overlapped overwrite would duplicate the work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Line range hint
25-39
: Consider using std::less for stable pointer ordering.While comparing raw pointers works, it might lead to non-deterministic ordering across different program runs. Consider using
std::less<const TypedDfaState*>
for stable ordering:auto operator<(DfaStatePair const& rhs) const -> bool { + static std::less<const TypedDfaState*> ptr_less; if (m_state1 == rhs.m_state1) { - return m_state2 < rhs.m_state2; + return ptr_less(m_state2, rhs.m_state2); } - return m_state1 < rhs.m_state1; + return ptr_less(m_state1, rhs.m_state1); }src/log_surgeon/finite_automata/NfaState.hpp (1)
130-130
: Consider updating the conditional check pattern.For consistency with coding guidelines, consider updating the condition to match the preferred pattern:
- if constexpr (NfaStateType::Utf8 == state_type) { + if constexpr (state_type == NfaStateType::Utf8) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/intersect-test.cpp
(4 hunks)src/log_surgeon/Lalr1Parser.hpp
(7 hunks)src/log_surgeon/finite_automata/DfaStatePair.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/intersect-test.cpp
- src/log_surgeon/Lalr1Parser.hpp
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (1)
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (8)
src/log_surgeon/finite_automata/DfaStatePair.hpp (3)
Line range hint
1-23
: LGTM! Clear documentation and consistent naming.The header guards and class documentation are well-structured. The template parameter renaming from
DFAState
toTypedDfaState
aligns with the PR objectives.
Line range hint
43-62
: LGTM! Well-documented method signature and appropriate use of [[nodiscard]].The method documentation clearly explains its purpose and parameters. The accessors are properly marked with [[nodiscard]] to prevent accidental value discarding.
Line range hint
65-81
: Address the TODO comment regarding UTF-8 support.The current implementation only handles single-byte transitions. As noted in the TODO, UTF-8 support needs to be added to handle multi-byte transitions properly.
Would you like me to help implement the UTF-8 support or create a GitHub issue to track this task?
src/log_surgeon/finite_automata/NfaState.hpp (5)
1-2
: LGTM! Header guard naming updated correctly.The header guard has been properly renamed to match the new file name, maintaining consistency with the naming convention.
20-24
: LGTM! Type names updated consistently.The template parameter and type aliases have been renamed following the new convention, improving clarity by removing redundant prefixes.
26-38
: LGTM! Constructor implementation aligns with previous learnings.The implementation maintains the use of
std::vector
for tags as established in previous changes, and the type names have been consistently updated.
Line range hint
169-235
: LGTM! Serialization implementation follows coding guidelines.The implementation consistently uses the preferred
false == expression
pattern for boolean comparisons and handles all state components thoroughly.
20-27
: Consider combining DFA and NFA state types in this PR.Based on the PR discussion, there's a plan to combine DFA and NFA state types in a future PR. Consider incorporating this change now to avoid multiple reviews of overlapping changes.
Let's verify the current usage to assess the impact:
Updated. |
Changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be almost ready to merge. A few things:
- We should open an issue to track renaming of
RegexAST
toRegexAst
- We should open an issue to track the refactoring of typed states
- This PR will also need a multi-line commit message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/log_surgeon/finite_automata/NfaState.hpp (2)
Line range hint
26-118
: Consider improving readability of the conditional type declaration.The conditional type declaration could be more readable by using line breaks and consistent indentation.
Consider this formatting:
- std::conditional_t<state_type == StateType::Utf8, Tree, std::tuple<>> m_tree_transitions; + std::conditional_t< + state_type == StateType::Utf8, + Tree, + std::tuple<> + > m_tree_transitions;
Line range hint
121-167
: Consider improving const correctness in add_interval.The method could benefit from const correctness improvements.
Consider these improvements:
- uint32_t const bound = std::min(interval.second, cSizeOfByte - 1); + uint32_t const bound = std::min<uint32_t>(interval.second, cSizeOfByte - 1); - uint32_t overlap_low = std::max(data.m_interval.first, interval.first); - uint32_t overlap_high = std::min(data.m_interval.second, interval.second); + uint32_t const overlap_low = std::max<uint32_t>(data.m_interval.first, interval.first); + uint32_t const overlap_high = std::min<uint32_t>(data.m_interval.second, interval.second);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CMakeLists.txt
(2 hunks)src/log_surgeon/Lalr1Parser.hpp
(4 hunks)src/log_surgeon/Lexer.tpp
(18 hunks)src/log_surgeon/finite_automata/DfaState.hpp
(4 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(8 hunks)src/log_surgeon/finite_automata/StateType.hpp
(1 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-nfa.cpp
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/finite_automata/StateType.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
- CMakeLists.txt
- tests/CMakeLists.txt
- src/log_surgeon/Lalr1Parser.hpp
- src/log_surgeon/Lexer.tpp
🧰 Additional context used
📓 Path-based instructions (3)
tests/test-nfa.cpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/DfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
📓 Learnings (3)
tests/test-nfa.cpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/DfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-latest, Release)
- GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (9)
src/log_surgeon/finite_automata/DfaState.hpp (4)
1-2
: LGTM! Header guards and includes are properly updated.The header guard naming follows the new convention, and the necessary include for
StateType.hpp
has been added.Also applies to: 12-12
16-23
: LGTM! Type aliases and class declaration follow the new naming convention.The changes correctly implement the PR objectives by:
- Using
StateType
as the template parameter- Renaming type aliases to
ByteDfaState
andUtf8DfaState
- Removing the redundant "Regex" prefix from the class name
Line range hint
59-74
: Potential out-of-bounds access in thenext
function needs tracking.For the
Byte
state type, accessingm_bytes_transition[character]
without validating thatcharacter
is less thancSizeOfByte
(256) could lead to out-of-bounds access.Since this issue was acknowledged but deferred, would you like me to create a GitHub issue to track this? Here's a script to check for similar patterns in the codebase:
#!/bin/bash # Description: Find potential array access without bounds checking # Look for direct array access with uint32_t index ast-grep --pattern 'uint32_t $var; $$$ $_[$var]'
Line range hint
37-39
: LGTM! Boolean expression follows coding guidelines.The implementation correctly uses
false == m_matching_variable_ids.empty()
as per the coding guidelines.tests/test-nfa.cpp (3)
10-10
: LGTM! Type declarations are consistent with the new naming scheme.The changes align well with the PR objectives to remove redundant "Regex" prefix and standardize naming conventions.
Also applies to: 16-16, 23-31
47-47
: LGTM! NFA instantiation updated correctly.The instantiation uses the new type name consistently.
Line range hint
33-156
: LGTM! Test implementation remains robust.The test thoroughly verifies the NFA serialization behaviour, which is unaffected by the renaming changes.
src/log_surgeon/finite_automata/NfaState.hpp (2)
1-2
: LGTM! Header guards and type declarations updated consistently.The changes align with the PR objectives while maintaining the same functionality.
Also applies to: 20-24
Line range hint
168-235
: LGTM! Serialize implementation follows coding guidelines.The implementation correctly follows the coding guideline for boolean comparisons and maintains consistency with the new naming scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
refactor: Update NFA, DFA, and LALR1 to align with the latest coding guidelines:
- Remove redundant `Regex` prefixes from NFA and DFA's class names
- Follow the class naming convention only to capitalize the first character for NFA, DFA, and LALR1 parser
Regex
prefixes from NFA and DFA's class namesReferences
Description
Combines:
DfaStateType.hpp
andNfaStateType.hpp
intoStateType.hpp
.Additions:
<algorithm>
toTaggedTransition.hpp
as reordering of includes caused by file name changes causes previous clang-tidy warning to become a compiler error.Renames:
DFAByteState
toByteDfaState
DFAUTF8State
toUtf8DfaState
NFAByteState
toByteNfaState
NFAUTF8State
toUtf8NfaState
DfaState
/DfaStateType
toTypedDfaState
NfaStateType
toTypedNfaState
RegexDFA*
toDfa*
RegexNFA*
toNfa*
NFA*
toNfa*
DFA*
toDfa*
LALR1Parser
toLalr1Parser
Does not rename:
*AST*
as there are too many instances of this and it will create conflicts with future PRs that are already submitted. This will be fixed in separate clang-tidy PRs for each file in the future.Validation performed
Previously existing tests still succeed.
Summary by CodeRabbit
I'll provide release notes based on the comprehensive changes across the log_surgeon project:
Release Notes: Log Surgeon Type Refactoring
Code Modernization
Type Changes
RegexNFA
withNfa
RegexNFAState
withNfaState
RegexNFAStateType
toStateType
Compatibility
Performance
This release represents a comprehensive code modernization effort, improving type consistency and readability without altering core functionality.