Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting Description and Commit Messages Length #187

Merged
merged 2 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions pr_agent/algo/pr_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -284,3 +284,25 @@ def find_line_number_of_relevant_line_in_file(diff_files: List[FilePatchInfo],
absolute_position = start2 + delta - 1
break
return position, absolute_position


def clip_tokens(text: str, max_tokens: int) -> str:
"""
Clip the number of tokens in a string to a maximum number of tokens.

Args:
text (str): The string to clip.
max_tokens (int): The maximum number of tokens allowed in the string.

Returns:
str: The clipped string.
"""
# We'll estimate the number of tokens by hueristically assuming 2.5 tokens per word
words = re.finditer(r'\S+', text)
max_words = max_tokens // 2.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really don't like the 2.5 factor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

far better is to encode the original, and estimate the tokens-to-chars ratio

end_pos = None
for i, token in enumerate(words):
if i == max_words:
end_pos = token.start()
break
return text if end_pos is None else text[:end_pos]
4 changes: 3 additions & 1 deletion pr_agent/tools/pr_reviewer.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from pr_agent.algo.ai_handler import AiHandler
from pr_agent.algo.pr_processing import get_pr_diff, retry_with_fallback_models, \
find_line_number_of_relevant_line_in_file
find_line_number_of_relevant_line_in_file, clip_tokens
from pr_agent.algo.token_handler import TokenHandler
from pr_agent.algo.utils import convert_to_markdown, try_fix_json
from pr_agent.config_loader import get_settings
Expand Down Expand Up @@ -62,6 +62,8 @@ def __init__(self, pr_url: str, is_answer: bool = False, args: list = None):
"extra_instructions": get_settings().pr_reviewer.extra_instructions,
"commit_messages_str": self.git_provider.get_commit_messages(),
}
self.vars["description"] = clip_tokens(self.vars["description"], 500)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only for reviewer ? what about other tools

better to edit 'self.git_provider.get_pr_description(),'
'self.git_provider.get_commit_messages()'

self.vars["commit_messages_str"] = clip_tokens(self.vars["commit_messages_str"], 500)

self.token_handler = TokenHandler(
self.git_provider.pr,
Expand Down