Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update token_handler.py #698

Closed
wants to merge 1 commit into from
Closed

Update token_handler.py #698

wants to merge 1 commit into from

Conversation

DedyKredo
Copy link

@DedyKredo DedyKredo commented Feb 22, 2024

Type

enhancement, bug_fix


Description

  • Introduced error handling in _get_system_user_tokens to return -1 when an exception occurs, enhancing robustness.
  • Adjusted formatting in count_tokens method without altering functionality.

Changes walkthrough

Relevant files
Error handling
token_handler.py
Enhance Error Handling in Token Calculation                           

pr_agent/algo/token_handler.py

  • Wrapped token calculation in _get_system_user_tokens with a try-except
    block.
  • Returns -1 if an exception occurs during token calculation.
  • No changes made to the count_tokens method, only formatting adjusted.
  • +10/-7   

    PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @codiumai-pr-agent-pro codiumai-pr-agent-pro bot added enhancement New feature or request bug_fix labels Feb 22, 2024
    Copy link
    Contributor

    PR Description updated to latest commit (a461c2a)

    Copy link
    Contributor

    PR Review

         PR feedback                    
    ⏱️ Estimated effort to review [1-5]

    2, because the changes are straightforward and localized to a specific functionality within the token_handler.py file. The modifications include error handling and minor formatting adjustments, which are not complex to understand.

    🧪 Relevant tests

    No

    🔍 Possible issues
    • The exception handling in _get_system_user_tokens uses a bare except, which can catch unexpected exceptions and make debugging harder. It's recommended to catch specific exceptions.
    • Returning -1 on error may not be the best approach if the calling code expects a non-negative integer count of tokens. It could lead to further errors if not handled properly.
    🔒 Security concerns

    No


    ✨ Review tool usage guide:

    Overview:
    The review tool scans the PR code changes, and generates a PR review. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR.
    When commenting, to edit configurations related to the review tool (pr_reviewer section), use the following template:

    /review --pr_reviewer.some_config1=... --pr_reviewer.some_config2=...
    

    With a configuration file, use the following template:

    [pr_reviewer]
    some_config1=...
    some_config2=...
    
    Utilizing extra instructions

    The review tool can be configured with extra instructions, which can be used to guide the model to a feedback tailored to the needs of your project.

    Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Specify the relevant sub-tool, and the relevant aspects of the PR that you want to emphasize.

    Examples for extra instructions:

    [pr_reviewer] # /review #
    extra_instructions="""
    In the 'possible issues' section, emphasize the following:
    - Does the code logic cover relevant edge cases?
    - Is the code logic clear and easy to understand?
    - Is the code logic efficient?
    ...
    """
    

    Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.

    How to enable\disable automation
    • When you first install PR-Agent app, the default mode for the review tool is:
    pr_commands = ["/review", ...]
    

    meaning the review tool will run automatically on every PR, with the default configuration.
    Edit this field to enable/disable the tool, or to change the used configurations

    Auto-labels

    The review tool can auto-generate two specific types of labels for a PR:

    • a possible security issue label, that detects possible security issues (enable_review_labels_security flag)
    • a Review effort [1-5]: x label, where x is the estimated effort to review the PR (enable_review_labels_effort flag)
    Extra sub-tools

    The review tool provides a collection of possible feedbacks about a PR.
    It is recommended to review the possible options, and choose the ones relevant for your use case.
    Some of the feature that are disabled by default are quite useful, and should be considered for enabling. For example:
    require_score_review, require_soc2_ticket, and more.

    Auto-approve PRs

    By invoking:

    /review auto_approve
    

    The tool will automatically approve the PR, and add a comment with the approval.

    To ensure safety, the auto-approval feature is disabled by default. To enable auto-approval, you need to actively set in a pre-defined configuration file the following:

    [pr_reviewer]
    enable_auto_approval = true
    

    (this specific flag cannot be set with a command line argument, only in the configuration file, committed to the repository)

    You can also enable auto-approval only if the PR meets certain requirements, such as that the estimated_review_effort is equal or below a certain threshold, by adjusting the flag:

    [pr_reviewer]
    maximal_review_effort = 5
    
    More PR-Agent commands

    To invoke the PR-Agent, add a comment using one of the following commands:

    • /review: Request a review of your Pull Request.
    • /describe: Update the PR title and description based on the contents of the PR.
    • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
    • /ask <QUESTION>: Ask a question about the PR.
    • /update_changelog: Update the changelog based on the PR's contents.
    • /add_docs 💎: Generate docstring for new components introduced in the PR.
    • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
    • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

    See the tools guide for more details.
    To list the possible configuration parameters, add a /config comment.

    See the review usage page for a comprehensive guide on using this tool.

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Feb 22, 2024

    PR Code Suggestions

    Suggestions                                                                                                                                                     
    best practice
    Specify the exception type in the try-except block.                          

    It's recommended to specify the exception type you are catching with the try-except block
    to avoid catching unexpected exceptions, which can make debugging harder. For instance, if
    you are expecting a template rendering error, you could catch TemplateSyntaxError.

    pr_agent/algo/token_handler.py [52-60]

     try:
         environment = Environment(undefined=StrictUndefined)
         system_prompt = environment.from_string(system).render(vars)
         user_prompt = environment.from_string(user).render(vars)
         system_prompt_tokens = len(encoder.encode(system_prompt))
         user_prompt_tokens = len(encoder.encode(user_prompt))
         return system_prompt_tokens + user_prompt_tokens
    -except:
    +except TemplateSyntaxError:
         return -1
     
    Add exception handling to the count_tokens method for robustness.

    For the count_tokens method, it's a good practice to handle potential exceptions that
    might arise from the encode method, similar to the handling in _get_system_user_tokens.
    This ensures that the application can gracefully handle errors and provide more robust
    error handling.

    pr_agent/algo/token_handler.py [72]

    -return len(self.encoder.encode(patch, disallowed_special=()))
    +try:
    +    return len(self.encoder.encode(patch, disallowed_special=()))
    +except Exception as e:
    +    logging.error(f"Failed to encode patch: {e}")
    +    return -1
     
    enhancement
    Log exception details for better debugging.                                  

    Consider logging the exception details when an error occurs during the token calculation.
    This can help with debugging and understanding why a specific input leads to an exception.
    You can use the logging module to log the exception information.

    pr_agent/algo/token_handler.py [52-60]

    +import logging
     try:
         environment = Environment(undefined=StrictUndefined)
         system_prompt = environment.from_string(system).render(vars)
         user_prompt = environment.from_string(user).render(vars)
         system_prompt_tokens = len(encoder.encode(system_prompt))
         user_prompt_tokens = len(encoder.encode(user_prompt))
         return system_prompt_tokens + user_prompt_tokens
    -except:
    +except Exception as e:
    +    logging.error(f"Failed to calculate tokens: {e}")
         return -1
     
    maintainability
    Explicitly specify the disallowed_special parameter for clarity.

    To ensure the encode method's behavior is clear and maintainable, explicitly specify the
    disallowed_special parameter when encoding system and user prompts, even if it's an empty
    tuple. This makes it explicit that no special tokens are disallowed, improving code
    readability and maintainability.

    pr_agent/algo/token_handler.py [56-57]

    -system_prompt_tokens = len(encoder.encode(system_prompt))
    -user_prompt_tokens = len(encoder.encode(user_prompt))
    +system_prompt_tokens = len(encoder.encode(system_prompt, disallowed_special=()))
    +user_prompt_tokens = len(encoder.encode(user_prompt, disallowed_special=()))
     
    Create a helper function to reduce code duplication in encoding logic.       

    Since the encoder.encode method is used multiple times with similar parameters, consider
    creating a helper function to encapsulate the encoding logic. This will reduce code
    duplication and make the code cleaner and easier to maintain.

    pr_agent/algo/token_handler.py [56-57]

    -system_prompt_tokens = len(encoder.encode(system_prompt))
    -user_prompt_tokens = len(encoder.encode(user_prompt))
    +def encode_prompt(prompt):
    +    return len(encoder.encode(prompt, disallowed_special=()))
     
    +system_prompt_tokens = encode_prompt(system_prompt)
    +user_prompt_tokens = encode_prompt(user_prompt)
    +

    ✨ Improve tool usage guide:

    Overview:
    The improve tool scans the PR code changes, and automatically generates suggestions for improving the PR code. The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on a PR.
    When commenting, to edit configurations related to the improve tool (pr_code_suggestions section), use the following template:

    /improve --pr_code_suggestions.some_config1=... --pr_code_suggestions.some_config2=...
    

    With a configuration file, use the following template:

    [pr_code_suggestions]
    some_config1=...
    some_config2=...
    
    Enabling\disabling automation

    When you first install the app, the default mode for the improve tool is:

    pr_commands = ["/improve --pr_code_suggestions.summarize=true", ...]
    

    meaning the improve tool will run automatically on every PR, with summarization enabled. Delete this line to disable the tool from running automatically.

    Utilizing extra instructions

    Extra instructions are very important for the improve tool, since they enable to guide the model to suggestions that are more relevant to the specific needs of the project.

    Be specific, clear, and concise in the instructions. With extra instructions, you are the prompter. Specify relevant aspects that you want the model to focus on.

    Examples for extra instructions:

    [pr_code_suggestions] # /improve #
    extra_instructions="""
    Emphasize the following aspects:
    - Does the code logic cover relevant edge cases?
    - Is the code logic clear and easy to understand?
    - Is the code logic efficient?
    ...
    """
    

    Use triple quotes to write multi-line instructions. Use bullet points to make the instructions more readable.

    A note on code suggestions quality
    • While the current AI for code is getting better and better (GPT-4), it's not flawless. Not all the suggestions will be perfect, and a user should not accept all of them automatically.
    • Suggestions are not meant to be simplistic. Instead, they aim to give deep feedback and raise questions, ideas and thoughts to the user, who can then use his judgment, experience, and understanding of the code base.
    • Recommended to use the 'extra_instructions' field to guide the model to suggestions that are more relevant to the specific needs of the project, or use the custom suggestions 💎 tool
    • With large PRs, best quality will be obtained by using 'improve --extended' mode.
    More PR-Agent commands

    To invoke the PR-Agent, add a comment using one of the following commands:

    • /review: Request a review of your Pull Request.
    • /describe: Update the PR title and description based on the contents of the PR.
    • /improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
    • /ask <QUESTION>: Ask a question about the PR.
    • /update_changelog: Update the changelog based on the PR's contents.
    • /add_docs 💎: Generate docstring for new components introduced in the PR.
    • /generate_labels 💎: Generate labels for the PR based on the PR's contents.
    • /analyze 💎: Automatically analyzes the PR, and presents changes walkthrough for each component.

    See the tools guide for more details.
    To list the possible configuration parameters, add a /config comment.

    See the improve usage page for a more comprehensive guide on using this tool.

    @DedyKredo
    Copy link
    Author

    /help

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Feb 22, 2024

    PR Agent Walkthrough

    🤖 Welcome to the PR Agent, an AI-powered tool for automated pull request analysis, feedback, suggestions and more.

    Here is a list of tools you can use to interact with the PR Agent:

    ToolDescriptionInvoke Interactively 💎

    DESCRIBE

    Generates PR description - title, type, summary, code walkthrough and labels
    • Run

    REVIEW

    Adjustable feedback about the PR, possible issues, security concerns, review effort and more
    • Run

    IMPROVE

    Code suggestions for improving the PR.
    • Run

    ANALYZE 💎

    Identifies code components that changed in the PR, and enables to interactively generate tests, docs, and code suggestions for each component.
    • Run

    UPDATE CHANGELOG

    Automatically updates the changelog.
    • Run

    ADD DOCUMENTATION 💎

    Generates documentation to methods/functions/classes that changed in the PR.
    • Run

    ASK

    Answering free-text questions about the PR.

    [*]

    GENERATE CUSTOM LABELS

    Generates custom labels for the PR, based on specific guidelines defined by the user

    [*]

    TEST 💎

    Generates unit tests for a specific component, based on the PR code change.

    [*]

    CI FEEDBACK 💎

    Generates feedback and analysis for a failed CI job.

    [*]

    CUSTOM SUGGESTIONS 💎

    Generates custom suggestions for improving the PR code, based on specific guidelines defined by the user.

    [*]

    SIMILAR ISSUE

    Automatically retrieves and presents similar issues.

    [*]

    (1) Note that each tool be triggered automatically when a new PR is opened, or called manually by commenting on a PR.

    (2) Tools marked with [*] require additional parameters to be passed. For example, to invoke the /ask tool, you need to comment on a PR: /ask "<question content>". See the relevant documentation for each tool for more details.

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Feb 22, 2024

    PR Analysis

    • This screen contains a list of code components that were changed in this PR.
    • You can initiate specific actions for each component, by checking the relevant boxes.
    • After you check a box, the action will be performed automatically by PR-Agent.
    • Results will appear as a comment on the PR, typically after 30-60 seconds.
    fileChanged components
    token_handler.py
    • Test
    • Docs
    • Improve
     
    _get_system_user_tokens
    (method of TokenHandler)
     
    +9/-6
     

    ✨ Usage guide:

    Using static code analysis capabilities, the analyze tool scans the PR code changes and find the code components (methods, functions, classes) that changed in the PR.
    The tool can be triggered automatically every time a new PR is opened, or can be invoked manually by commenting on any PR:

    /analyze
    

    Language that are currently supported: Python, Java, C++, JavaScript, TypeScript.
    See more information about the tool in the docs.

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Feb 22, 2024

    PR Documentation

    Here is a list of the files that were modified in the PR, with docstring for each altered code component:

    token_handler.py                                                                                                                               

      _get_system_user_tokens (method) [+9/-6]                                                                                                       
      Component signature:
      def _get_system_user_tokens(self, pr, encoder, vars: dict, system, user):

      Docstring:

      """
      Calculates the number of tokens in the system and user strings.
      
      Args:
      - pr: The pull request object.
      - encoder: An object of the encoding_for_model class from the tiktoken module.
      - vars: A dictionary of variables.
      - system: The system string.
      - user: The user string.
      
      Returns:
      The sum of the number of tokens in the system and user strings.
      """

    Copy link
    Contributor

    Changelog updates:

    2024-02-22

    Enhanced

    • Improved error handling in token calculation by returning -1 when an exception occurs, enhancing robustness.
    • Adjusted formatting in count_tokens method without altering functionality.

    to commit the new content to the CHANGELOG.md file, please type:
    '/update_changelog --pr_update_changelog.push_changelog_changes=true'

    Copy link
    Contributor

    codiumai-pr-agent-pro bot commented Feb 22, 2024

    Generated tests for '_get_system_user_tokens'

      _get_system_user_tokens (method) [+9/-6]

      Component signature:

      def _get_system_user_tokens(self, pr, encoder, vars: dict, system, user):


      Tests for code changes in _get_system_user_tokens method:

      [happy path]
      _get_system_user_tokens should correctly calculate and return the sum of tokens in system and user strings under normal conditions

      test_code:

      import pytest
      from pr_agent.algo.token_handler import TokenHandler
      from tiktoken.mocks import MockEncoder
      
      def test_get_system_user_tokens_happy_path():
          # Given
          token_handler = TokenHandler()
          mock_encoder = MockEncoder({"system": 5, "user": 3})  # Mock encoder to return specific token counts
          vars = {"name": "PR Tester"}
          system = "Hello, {{ name }}!"
          user = "Welcome, {{ name }}."
      
          # When
          result = token_handler._get_system_user_tokens(None, mock_encoder, vars, system, user)
      
          # Then
          assert result == 8, "Expected the sum of system and user tokens to be 8"
      [happy path]
      _get_system_user_tokens should handle empty strings for system and user without errors

      test_code:

      import pytest
      from pr_agent.algo.token_handler import TokenHandler
      from tiktoken.mocks import MockEncoder
      
      def test_get_system_user_tokens_with_empty_strings():
          # Given
          token_handler = TokenHandler()
          mock_encoder = MockEncoder({"": 0})  # Mock encoder to return 0 tokens for empty string
          vars = {}
      
          # When
          result = token_handler._get_system_user_tokens(None, mock_encoder, vars, "", "")
      
          # Then
          assert result == 0, "Expected the sum of tokens for empty system and user strings to be 0"
      [edge case]
      _get_system_user_tokens should return -1 when an exception occurs during token calculation

      test_code:

      import pytest
      from pr_agent.algo.token_handler import TokenHandler
      from tiktoken.mocks import MockEncoder
      
      def test_get_system_user_tokens_with_exception():
          # Given
          token_handler = TokenHandler()
          mock_encoder = MockEncoder(exception=RuntimeError("Encoding failed"))  # Mock encoder to raise an exception
          vars = {"name": "PR Tester"}
          system = "Hello, {{ name }}!"
          user = "Welcome, {{ name }}."
      
          # When
          result = token_handler._get_system_user_tokens(None, mock_encoder, vars, system, user)
      
          # Then
          assert result == -1, "Expected the method to return -1 on exception"

      ✨ Usage guide:

      The test tool generate tests for a selected component, based on the PR code changes.
      It can be invoked manually by commenting on any PR:

      /test component_name
      

      where 'component_name' is the name of a specific component in the PR. To get a list of the components that changed in the PR, use the analyze tool.
      Language that are currently supported: Python, Java, C++, JavaScript, TypeScript.

      Configuration options:

      • num_tests: number of tests to generate. Default is 3.
      • testing_framework: the testing framework to use. If not set, for Python it will use pytest, for Java it will use JUnit, for C++ it will use Catch2, and for JavaScript and TypeScript it will use jest.
      • avoid_mocks: if set to true, the tool will try to avoid using mocks in the generated tests. Note that even if this option is set to true, the tool might still use mocks if it cannot generate a test without them. Default is true.
      • extra_instructions: Optional extra instructions to the tool. For example: "use the following mock injection scheme: ...".
      • file: in case there are several components with the same name, you can specify the relevant file.
      • class_name: in case there are several components with the same name in the same file, you can specify the relevant class name.

      See more information about the test tool in the docs.

    @DedyKredo DedyKredo closed this Feb 22, 2024
    @mrT23 mrT23 deleted the DedyKredo-patch-1 branch May 18, 2024 09:28
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant