Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support more types of github ticket url / references #1290

Merged
merged 6 commits into from
Oct 14, 2024

Conversation

hussam789
Copy link
Collaborator

@hussam789 hussam789 commented Oct 13, 2024

PR Type

enhancement, bug fix


Description

  • Enhanced the ticket extraction logic to support more types of GitHub ticket URLs, including full URLs and shorthand notations like owner/repo#issue_number.
  • Improved the pattern matching to ensure that duplicate ticket entries are not added to the list.
  • Added error handling to log any exceptions that occur during the extraction process.

Changes walkthrough 📝

Relevant files
Enhancement
ticket_pr_compliance_check.py
Enhance GitHub ticket URL extraction logic                             

pr_agent/tools/ticket_pr_compliance_check.py

  • Enhanced pattern matching to support more GitHub ticket URL formats.
  • Added support for shorthand notations like owner/repo#issue_number.
  • Improved logic to avoid duplicate entries in the ticket list.
  • Enhanced error handling for ticket extraction.
  • +20/-14 

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🏅 Score: 92
    🧪 No relevant tests
    🔒 No security concerns identified
    🔀 No multiple PR themes
    ⚡ Recommended focus areas for review

    Potential Performance Issue
    The new implementation uses multiple regex patterns and loops, which might be less efficient for large PR descriptions. Consider benchmarking and optimizing if necessary.

    Edge Case Handling
    Verify that the new regex patterns cover all possible GitHub issue URL formats, including enterprise GitHub instances.

    @hussam789
    Copy link
    Collaborator Author

    /improve

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Oct 14, 2024

    PR Code Suggestions ✨

    • Author self-review: I have reviewed the PR code suggestions, and addressed the relevant ones.

    Previous suggestions

    ✅ Suggestions up to commit 3a52122
    CategorySuggestion                                                                                                                                    Score
    Possible issue
    ✅ Add input validation to prevent potential runtime errors caused by invalid input

    Consider adding input validation for the repo_path parameter to ensure it's not
    empty or None before using it in the f-string, preventing potential runtime errors.

    pr_agent/tools/ticket_pr_compliance_check.py [53-54]

    -if issue_number.isdigit() and len(issue_number) < 5:
    +if issue_number.isdigit() and len(issue_number) < 5 and repo_path:
         github_tickets.add(f'https://github.com/{repo_path}/issues/{issue_number}')

    [Suggestion has been applied]

    Suggestion importance[1-10]: 8

    Why: Adding input validation for repo_path is a valuable improvement that prevents potential runtime errors, enhancing the robustness of the code.

    8
    Performance
    Optimize data processing by using set comprehension to improve performance and reduce code complexity

    Consider using a more efficient set comprehension instead of a for loop when
    processing matches to improve performance and reduce code complexity.

    pr_agent/tools/ticket_pr_compliance_check.py [40-54]

    -github_tickets = set()
    -for match in matches:
    -    if match[0]:  # Full URL match
    -        github_tickets.add(match[0])
    -    elif match[1]:  # Shorthand notation match: owner/repo#issue_number
    -        owner, repo, issue_number = match[2], match[3], match[4]
    -        github_tickets.add(f'https://github.com/{owner}/{repo}/issues/{issue_number}')
    -    else:  # #123 format
    -        issue_number = match[5][1:]  # remove #
    -        if issue_number.isdigit() and len(issue_number) < 5:
    -            github_tickets.add(f'https://github.com/{repo_path}/issues/{issue_number}')
    +github_tickets = {
    +    match[0] if match[0] else
    +    f'https://github.com/{match[2]}/{match[3]}/issues/{match[4]}' if match[1] else
    +    f'https://github.com/{repo_path}/issues/{match[5][1:]}' if match[5][1:].isdigit() and len(match[5][1:]) < 5 else
    +    None
    +    for match in matches
    +} - {None}
    Suggestion importance[1-10]: 7

    Why: The use of set comprehension can improve performance and reduce code complexity, making the code more efficient, though the performance gain might be marginal.

    7
    Best practice
    Improve variable naming to enhance code readability and maintainability

    Consider using a more descriptive name for the regex pattern, such as
    'GITHUB_ISSUE_REFERENCE_PATTERN', to better reflect its purpose of matching various
    GitHub issue reference formats.

    pr_agent/tools/ticket_pr_compliance_check.py [9-11]

    -GITHUB_TICKET_PATTERN = re.compile(
    +GITHUB_ISSUE_REFERENCE_PATTERN = re.compile(
          r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)|(#\d+)'
     )
    Suggestion importance[1-10]: 5

    Why: The suggestion to rename the regex pattern variable to a more descriptive name improves code readability and maintainability, but it does not address a critical issue.

    5
    ✅ Suggestions up to commit 840e8c4
    CategorySuggestion                                                                                                                                    Score
    Performance
    ✅ Compile regex patterns outside the function to improve performance for repeated calls
    Suggestion Impact:The regex pattern was compiled outside the function, and the function was updated to use the precompiled pattern, as suggested.

    code diff:

    +# Compile the regex pattern once, outside the function
    +GITHUB_TICKET_PATTERN = re.compile(r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)')
     
     def find_jira_tickets(text):
         # Regular expression patterns for JIRA tickets
    @@ -35,9 +37,8 @@
         github_tickets = []
         try:
             # Pattern to match full GitHub issue URLs and shorthand notations like owner/repo#issue_number or https://github.com/owner/repo/issues/issue_number
    -        pattern = r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)'
    -
    -        matches = re.findall(pattern, pr_description)
    +        matches = GITHUB_TICKET_PATTERN.findall(pr_description)

    Move the regex pattern compilation outside the function to improve performance,
    especially if this function is called multiple times.

    pr_agent/tools/ticket_pr_compliance_check.py [31-40]

    +# Compile the regex pattern once, outside the function
    +GITHUB_TICKET_PATTERN = re.compile(r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)')
    +
     def extract_ticket_links_from_pr_description(pr_description, repo_path):
         """
         Extract all ticket links from PR description
         """
         github_tickets = []
         try:
    -        # Pattern to match full GitHub issue URLs and shorthand notations like owner/repo#issue_number or https://github.com/owner/repo/issues/issue_number
    -        pattern = r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)'
    +        matches = GITHUB_TICKET_PATTERN.findall(pr_description)
     
    -        matches = re.findall(pattern, pr_description)
    -
    Suggestion importance[1-10]: 9

    Why: Compiling the regex pattern outside the function is a strong suggestion for improving performance, especially if the function is called multiple times. This change is beneficial for efficiency and aligns well with performance optimization goals.

    9
    ✅ Use a set instead of a list to handle duplicates and improve performance

    Consider using a set instead of a list for github_tickets to automatically handle
    duplicates and improve performance when checking for existing entries.

    pr_agent/tools/ticket_pr_compliance_check.py [35-46]

    -github_tickets = []
    +github_tickets = set()
     try:
         # Pattern to match full GitHub issue URLs and shorthand notations like owner/repo#issue_number or https://github.com/owner/repo/issues/issue_number
         pattern = r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)'
     
         matches = re.findall(pattern, pr_description)
         for match in matches:
             if match[0]:  # Full URL match
    -            github_tickets.append(match[0])
    +            github_tickets.add(match[0])
             else:  # Shorthand notation match
                 owner, repo, issue_number = match[2], match[3], match[4]
    -            github_tickets.append(f'https://github.com/{owner}/{repo}/issues/{issue_number}')
    +            github_tickets.add(f'https://github.com/{owner}/{repo}/issues/{issue_number}')

    [Suggestion has been applied]

    Suggestion importance[1-10]: 8

    Why: Using a set instead of a list for github_tickets is a valid suggestion as it automatically handles duplicates and improves performance when checking for existing entries. This change aligns with the PR's goal of avoiding duplicate entries in the ticket list.

    8
    Enhancement
    ✅ Combine multiple regex patterns into a single pattern to simplify the code and reduce regex operations
    Suggestion Impact:The regex pattern was compiled once outside the function, which aligns with the suggestion to simplify regex operations. However, the specific combination of patterns into a single pattern using the | (OR) operator was not implemented.

    code diff:

    +# Compile the regex pattern once, outside the function
    +GITHUB_TICKET_PATTERN = re.compile(r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)')
     
     
     def find_jira_tickets(text):
    @@ -35,9 +38,8 @@
         github_tickets = []
         try:
             # Pattern to match full GitHub issue URLs and shorthand notations like owner/repo#issue_number or https://github.com/owner/repo/issues/issue_number
    -        pattern = r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)'
    -
    -        matches = re.findall(pattern, pr_description)
    +        matches = GITHUB_TICKET_PATTERN.findall(pr_description)

    Combine the two regex patterns into a single pattern using the | (OR) operator to
    reduce the number of regex operations and simplify the code.

    pr_agent/tools/ticket_pr_compliance_check.py [37-56]

    -pattern = r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)'
    +pattern = r'(https://github[^/]+/[^/]+/[^/]+/issues/\d+)|(\b(\w+)/(\w+)#(\d+)\b)|(#\d+)'
     
     matches = re.findall(pattern, pr_description)
     for match in matches:
         if match[0]:  # Full URL match
    -        github_tickets.append(match[0])
    -    else:  # Shorthand notation match
    +        github_tickets.add(match[0])
    +    elif match[1]:  # Shorthand notation match
             owner, repo, issue_number = match[2], match[3], match[4]
    -        github_tickets.append(f'https://github.com/{owner}/{repo}/issues/{issue_number}')
    -if not github_tickets:
    -    # Search for #123 format within the same repo
    -    issue_number_pattern = r'#\d+'
    -    issue_numbers = re.findall(issue_number_pattern, pr_description)
    -    for issue_number in issue_numbers:
    -        issue_number = issue_number[1:]  # remove #
    +        github_tickets.add(f'https://github.com/{owner}/{repo}/issues/{issue_number}')
    +    else:  # #123 format
    +        issue_number = match[5][1:]  # remove #
             if issue_number.isdigit() and len(issue_number) < 5:
    -            issue_url = f'https://github.com/{repo_path}/issues/{issue_number}'
    -            if issue_url not in github_tickets:
    -                github_tickets.append(issue_url)
    +            github_tickets.add(f'https://github.com/{repo_path}/issues/{issue_number}')
    Suggestion importance[1-10]: 7

    Why: Combining regex patterns into a single pattern can simplify the code and reduce the number of regex operations, which is a reasonable enhancement. However, the improved code introduces a potential issue by using add instead of append, assuming a set is used, which is not explicitly mentioned in the suggestion.

    7

    hussam789 and others added 2 commits October 14, 2024 10:59
    Co-authored-by: codiumai-pr-agent-pro[bot] <151058649+codiumai-pr-agent-pro[bot]@users.noreply.github.com>
    @hussam789 hussam789 merged commit e82afdd into main Oct 14, 2024
    2 checks passed
    @hussam789 hussam789 deleted the hl/tickets_support branch October 14, 2024 11:34
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants