Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Update NFA, DFA, and LALR1 to align with the latest coding guidelines. #59

Merged
merged 420 commits into from
Jan 8, 2025

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Dec 5, 2024

  • Remove redundant Regex prefixes from NFA and DFA's class names
  • Follow the class naming convention only to capitalize the first character for NFA, DFA, and LALR1 parser

References

  • Depends on PR#58.
  • To review in parallel with PR#58, diff against PR#58 locally. In the repo run:
git fetch upstream pull/58/head:pr-58
git fetch upstream pull/59/head:pr-59
git diff pr-58 pr-59

Description

Combines:

  • DfaStateType.hpp and NfaStateType.hpp into StateType.hpp.

Additions:

  • Add <algorithm> to TaggedTransition.hpp as reordering of includes caused by file name changes causes previous clang-tidy warning to become a compiler error.

Renames:

  • DFAByteState to ByteDfaState
  • DFAUTF8State to Utf8DfaState
  • NFAByteState to ByteNfaState
  • NFAUTF8State to Utf8NfaState
  • Template parameters named DfaState/DfaStateType to TypedDfaState
  • Template parameters named NfaStateType to TypedNfaState
  • RegexDFA* to Dfa*
  • RegexNFA* to Nfa*
  • NFA* to Nfa*
  • DFA* to Dfa*
  • LALR1Parser to Lalr1Parser

Does not rename:

  • *AST* as there are too many instances of this and it will create conflicts with future PRs that are already submitted. This will be fixed in separate clang-tidy PRs for each file in the future.

Validation performed

Previously existing tests still succeed.

Summary by CodeRabbit

I'll provide release notes based on the comprehensive changes across the log_surgeon project:

Release Notes: Log Surgeon Type Refactoring

  • Code Modernization

    • Renamed LALR1 parser and finite automata classes for improved consistency
    • Updated type naming conventions across the entire project
    • Standardized state type representations
  • Type Changes

    • Replaced RegexNFA with Nfa
    • Replaced RegexNFAState with NfaState
    • Updated state type from RegexNFAStateType to StateType
    • Transitioned from uppercase to more readable type names
  • Compatibility

    • Maintained existing functionality during type renaming
    • Preserved all existing parsing and lexical analysis logic
    • Ensured seamless integration with existing test suites
  • Performance

    • No significant performance modifications
    • Type changes focused on code clarity and maintainability

This release represents a comprehensive code modernization effort, improving type consistency and readability without altering core functionality.

SharafMohamed and others added 30 commits November 1, 2024 06:49
…eterministic. This is fine as tags are unique across disjoint subtrees.
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
…ypedefs to the top of the file to fix compilation error.
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any header guard issue within this file should be tracked in #65

@LinZhihao-723
Copy link
Member

Before we proceed, a quick question: do we really need to differentiate DFA state type and NFA state type? Can we have a generic type to mark whether the input is byte or unicode?

Just to clarify: we don't necessarily do this in the current PR, but it's some design decision we should make

SharafMohamed and others added 3 commits December 19, 2024 09:10
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
@SharafMohamed
Copy link
Contributor Author

SharafMohamed commented Dec 19, 2024

Before we proceed, a quick question: do we really need to differentiate DFA state type and NFA state type? Can we have a generic type to mark whether the input is byte or unicode?

I thought one of the other PRs already posted combine these into a single type, but it seems that's not the case. I'll add a PR to combine the types.

@LinZhihao-723
Copy link
Member

Before we proceed, a quick question: do we really need to differentiate DFA state type and NFA state type? Can we have a generic type to mark whether the input is byte or unicode?

I thought one of the other PRs already posted combine these into a single type, but it seems that's not the case. I'll add a PR to combine the types.

Do u think we can do it in this PR? 'cuz reviewing the current renaming changes and then reviewing another PR that has a lot of overlapped overwrite would duplicate the work

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)

Line range hint 25-39: Consider using std::less for stable pointer ordering.

While comparing raw pointers works, it might lead to non-deterministic ordering across different program runs. Consider using std::less<const TypedDfaState*> for stable ordering:

 auto operator<(DfaStatePair const& rhs) const -> bool {
+    static std::less<const TypedDfaState*> ptr_less;
     if (m_state1 == rhs.m_state1) {
-        return m_state2 < rhs.m_state2;
+        return ptr_less(m_state2, rhs.m_state2);
     }
-    return m_state1 < rhs.m_state1;
+    return ptr_less(m_state1, rhs.m_state1);
 }
src/log_surgeon/finite_automata/NfaState.hpp (1)

130-130: Consider updating the conditional check pattern.

For consistency with coding guidelines, consider updating the condition to match the preferred pattern:

-    if constexpr (NfaStateType::Utf8 == state_type) {
+    if constexpr (state_type == NfaStateType::Utf8) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 992a2ec and 4e07f8d.

📒 Files selected for processing (4)
  • examples/intersect-test.cpp (4 hunks)
  • src/log_surgeon/Lalr1Parser.hpp (7 hunks)
  • src/log_surgeon/finite_automata/DfaStatePair.hpp (5 hunks)
  • src/log_surgeon/finite_automata/NfaState.hpp (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • examples/intersect-test.cpp
  • src/log_surgeon/Lalr1Parser.hpp
🧰 Additional context used
📓 Path-based instructions (2)
src/log_surgeon/finite_automata/DfaStatePair.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/NfaState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (1)
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (8)
src/log_surgeon/finite_automata/DfaStatePair.hpp (3)

Line range hint 1-23: LGTM! Clear documentation and consistent naming.

The header guards and class documentation are well-structured. The template parameter renaming from DFAState to TypedDfaState aligns with the PR objectives.


Line range hint 43-62: LGTM! Well-documented method signature and appropriate use of [[nodiscard]].

The method documentation clearly explains its purpose and parameters. The accessors are properly marked with [[nodiscard]] to prevent accidental value discarding.


Line range hint 65-81: Address the TODO comment regarding UTF-8 support.

The current implementation only handles single-byte transitions. As noted in the TODO, UTF-8 support needs to be added to handle multi-byte transitions properly.

Would you like me to help implement the UTF-8 support or create a GitHub issue to track this task?

src/log_surgeon/finite_automata/NfaState.hpp (5)

1-2: LGTM! Header guard naming updated correctly.

The header guard has been properly renamed to match the new file name, maintaining consistency with the naming convention.


20-24: LGTM! Type names updated consistently.

The template parameter and type aliases have been renamed following the new convention, improving clarity by removing redundant prefixes.


26-38: LGTM! Constructor implementation aligns with previous learnings.

The implementation maintains the use of std::vector for tags as established in previous changes, and the type names have been consistently updated.


Line range hint 169-235: LGTM! Serialization implementation follows coding guidelines.

The implementation consistently uses the preferred false == expression pattern for boolean comparisons and handles all state components thoroughly.


20-27: Consider combining DFA and NFA state types in this PR.

Based on the PR discussion, there's a plan to combine DFA and NFA state types in a future PR. Consider incorporating this change now to avoid multiple reviews of overlapping changes.

Let's verify the current usage to assess the impact:

@SharafMohamed
Copy link
Contributor Author

Any header guard issue within this file should be tracked in #65

Updated.

@SharafMohamed
Copy link
Contributor Author

Before we proceed, a quick question: do we really need to differentiate DFA state type and NFA state type? Can we have a generic type to mark whether the input is byte or unicode?

I thought one of the other PRs already posted combine these into a single type, but it seems that's not the case. I'll add a PR to combine the types.

Do u think we can do it in this PR? 'cuz reviewing the current renaming changes and then reviewing another PR that has a lot of overlapped overwrite would duplicate the work

Changed.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be almost ready to merge. A few things:

  • We should open an issue to track renaming of RegexAST to RegexAst
  • We should open an issue to track the refactoring of typed states
  • This PR will also need a multi-line commit message

src/log_surgeon/finite_automata/StateType.hpp Outdated Show resolved Hide resolved
src/log_surgeon/Lalr1Parser.hpp Show resolved Hide resolved
src/log_surgeon/Lalr1Parser.hpp Outdated Show resolved Hide resolved
src/log_surgeon/Lexer.tpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/log_surgeon/finite_automata/NfaState.hpp (2)

Line range hint 26-118: Consider improving readability of the conditional type declaration.

The conditional type declaration could be more readable by using line breaks and consistent indentation.

Consider this formatting:

-    std::conditional_t<state_type == StateType::Utf8, Tree, std::tuple<>> m_tree_transitions;
+    std::conditional_t<
+            state_type == StateType::Utf8,
+            Tree,
+            std::tuple<>
+    > m_tree_transitions;

Line range hint 121-167: Consider improving const correctness in add_interval.

The method could benefit from const correctness improvements.

Consider these improvements:

-    uint32_t const bound = std::min(interval.second, cSizeOfByte - 1);
+    uint32_t const bound = std::min<uint32_t>(interval.second, cSizeOfByte - 1);
-            uint32_t overlap_low = std::max(data.m_interval.first, interval.first);
-            uint32_t overlap_high = std::min(data.m_interval.second, interval.second);
+            uint32_t const overlap_low = std::max<uint32_t>(data.m_interval.first, interval.first);
+            uint32_t const overlap_high = std::min<uint32_t>(data.m_interval.second, interval.second);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e07f8d and 9af72f3.

📒 Files selected for processing (8)
  • CMakeLists.txt (2 hunks)
  • src/log_surgeon/Lalr1Parser.hpp (4 hunks)
  • src/log_surgeon/Lexer.tpp (18 hunks)
  • src/log_surgeon/finite_automata/DfaState.hpp (4 hunks)
  • src/log_surgeon/finite_automata/NfaState.hpp (8 hunks)
  • src/log_surgeon/finite_automata/StateType.hpp (1 hunks)
  • tests/CMakeLists.txt (2 hunks)
  • tests/test-nfa.cpp (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/log_surgeon/finite_automata/StateType.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
  • CMakeLists.txt
  • tests/CMakeLists.txt
  • src/log_surgeon/Lalr1Parser.hpp
  • src/log_surgeon/Lexer.tpp
🧰 Additional context used
📓 Path-based instructions (3)
tests/test-nfa.cpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/DfaState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

src/log_surgeon/finite_automata/NfaState.hpp (1)

Pattern **/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

📓 Learnings (3)
tests/test-nfa.cpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/DfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:127-128
Timestamp: 2024-11-10T16:46:58.543Z
Learning: `RegexNFAUTF8State` is defined as a type alias for `RegexNFAState<RegexNFAStateType::UTF8>`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (ubuntu-latest, Release)
  • GitHub Check: build (ubuntu-latest, Debug)
🔇 Additional comments (9)
src/log_surgeon/finite_automata/DfaState.hpp (4)

1-2: LGTM! Header guards and includes are properly updated.

The header guard naming follows the new convention, and the necessary include for StateType.hpp has been added.

Also applies to: 12-12


16-23: LGTM! Type aliases and class declaration follow the new naming convention.

The changes correctly implement the PR objectives by:

  • Using StateType as the template parameter
  • Renaming type aliases to ByteDfaState and Utf8DfaState
  • Removing the redundant "Regex" prefix from the class name

Line range hint 59-74: Potential out-of-bounds access in the next function needs tracking.

For the Byte state type, accessing m_bytes_transition[character] without validating that character is less than cSizeOfByte (256) could lead to out-of-bounds access.

Since this issue was acknowledged but deferred, would you like me to create a GitHub issue to track this? Here's a script to check for similar patterns in the codebase:

#!/bin/bash
# Description: Find potential array access without bounds checking
# Look for direct array access with uint32_t index
ast-grep --pattern 'uint32_t $var;
  $$$
  $_[$var]'

Line range hint 37-39: LGTM! Boolean expression follows coding guidelines.

The implementation correctly uses false == m_matching_variable_ids.empty() as per the coding guidelines.

tests/test-nfa.cpp (3)

10-10: LGTM! Type declarations are consistent with the new naming scheme.

The changes align well with the PR objectives to remove redundant "Regex" prefix and standardize naming conventions.

Also applies to: 16-16, 23-31


47-47: LGTM! NFA instantiation updated correctly.

The instantiation uses the new type name consistently.


Line range hint 33-156: LGTM! Test implementation remains robust.

The test thoroughly verifies the NFA serialization behaviour, which is unaffected by the renaming changes.

src/log_surgeon/finite_automata/NfaState.hpp (2)

1-2: LGTM! Header guards and type declarations updated consistently.

The changes align with the PR objectives while maintaining the same functionality.

Also applies to: 20-24


Line range hint 168-235: LGTM! Serialize implementation follows coding guidelines.

The implementation correctly follows the coding guideline for boolean comparisons and maintains consistency with the new naming scheme.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

refactor: Update NFA, DFA, and LALR1 to align with the latest coding guidelines:
- Remove redundant `Regex` prefixes from NFA and DFA's class names
- Follow the class naming convention only to capitalize the first character for NFA, DFA, and LALR1 parser

@SharafMohamed SharafMohamed changed the title refactor: Remove redundant regex prefix from class names and fix capitalization. refactor: Update NFA, DFA, and LALR1 to align with the latest coding guidelines. Jan 8, 2025
@SharafMohamed SharafMohamed merged commit 221225e into y-scope:main Jan 8, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants