Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting Description and Commit Messages Length #187

Merged
merged 2 commits into from
Aug 9, 2023
Merged

Conversation

okotek
Copy link
Contributor

@okotek okotek commented Aug 9, 2023

PR Type:

Enhancement


PR Description:

This PR introduces a method to limit the length of the description and commit messages to a maximum number of tokens. This is achieved by adding a new function 'clip_tokens' that clips the number of tokens in a string to a maximum number. The function is then used to limit the length of 'description' and 'commit_messages_str' in the PR reviewer.


PR Main Files Walkthrough:

  • pr_agent/algo/pr_processing.py: A new function 'clip_tokens' is added. This function takes a string and a maximum number of tokens as input and returns the string clipped to the maximum number of tokens.
  • pr_agent/tools/pr_reviewer.py: The 'clip_tokens' function is imported and used to limit the length of 'description' and 'commit_messages_str'.

@okotek
Copy link
Contributor Author

okotek commented Aug 9, 2023

/describe

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

PR Analysis

  • 🎯 Main theme: Adding a function to clip the number of tokens in a string to a maximum limit and applying it to PR descriptions and commit messages
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: No
  • Focused PR: Yes, the PR is focused as all changes are related to the implementation of the token clipping functionality
  • 🔒 Security concerns: No, the changes made in this PR do not introduce any apparent security concerns.

PR Feedback

  • General suggestions: The implementation of the token clipping function seems to be efficient and well-documented. However, it would be beneficial to add unit tests to ensure the function behaves as expected under different scenarios.

  • 🤖 Code feedback:

    • relevant file: pr_agent/algo/pr_processing.py
      suggestion: Consider using a more efficient method to find the end position. For example, you could use a generator expression with islice from itertools to avoid enumerating all words. This would be more efficient for large texts. [important]
      relevant line: for i, token in enumerate(words):

    • relevant file: pr_agent/algo/pr_processing.py
      suggestion: The heuristic of 2.5 tokens per word might not be accurate for all languages or types of text. Consider making this a configurable parameter, or better yet, consider a more robust method of estimating tokens. [medium]
      relevant line: max_words = max_tokens // 2.5

    • relevant file: pr_agent/tools/pr_reviewer.py
      suggestion: The magic number 500 is used here for the maximum number of tokens. It would be better to define this as a constant or a configurable setting, to improve maintainability and flexibility. [important]
      relevant line: self.vars["description"] = clip_tokens(self.vars["description"], 500)

How to use

To invoke the PR-Agent, add a comment using one of the following commands:
/review [-i]: Request a review of your Pull Request. For an incremental review, which only considers changes since the last review, include the '-i' option.
/describe: Modify the PR title and description based on the contents of the PR.
/improve: Suggest improvements to the code in the PR.
/ask <QUESTION>: Pose a question about the PR.

To edit any configuration parameter from 'configuration.toml', add --config_path=new_value
For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
To list the possible configuration parameters, use the /config command.

@github-actions github-actions bot changed the title A (still) hacky way to clip description and commit messages Limiting Description and Commit Messages Length Aug 9, 2023
@github-actions github-actions bot added the enhancement New feature or request label Aug 9, 2023
"""
# We'll estimate the number of tokens by hueristically assuming 2.5 tokens per word
words = re.finditer(r'\S+', text)
max_words = max_tokens // 2.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really don't like the 2.5 factor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

far better is to encode the original, and estimate the tokens-to-chars ratio

@@ -62,6 +62,8 @@ def __init__(self, pr_url: str, is_answer: bool = False, args: list = None):
"extra_instructions": get_settings().pr_reviewer.extra_instructions,
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.vars["description"] = clip_tokens(self.vars["description"], 500)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only for reviewer ? what about other tools

better to edit 'self.git_provider.get_pr_description(),'
'self.git_provider.get_commit_messages()'

@mrT23 mrT23 merged commit d38c523 into main Aug 9, 2023
1 check passed
@mrT23 mrT23 deleted the ok/limit_description branch August 9, 2023 11:14
@okotek okotek restored the ok/limit_description branch January 7, 2024 11:50
yochail pushed a commit to yochail/pr-agent that referenced this pull request Feb 11, 2024
Limiting Description and Commit Messages Length
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants