Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of LanceDB with PR-agent for managing database #523

Closed
wants to merge 7 commits into from
Closed

Integration of LanceDB with PR-agent for managing database #523

wants to merge 7 commits into from

Conversation

PrashantDixit0
Copy link
Contributor

@PrashantDixit0 PrashantDixit0 commented Dec 12, 2023

Type

Enhancement


Description

This PR integrates LanceDB with PR-agent to manage the database more seamlessly. The main changes include:

  • Addition of LanceDB as a vector database option in the configuration.
  • Modification of the PRSimilarIssue class to support LanceDB operations.
  • Creation of a new method to update the table with issues when LanceDB is used.
  • Modification of the run method to support querying using LanceDB.
  • Addition of LanceDB to the project dependencies.

PR changes walkthrough

Relevant files                                                                                                                                 
Enhancement
1 files
pr_similar_issue.py                                                                                 
    pr_agent/tools/pr_similar_issue.py

    The PRSimilarIssue class has been modified to support
    LanceDB operations. This includes checking if the index
    exists, indexing the entire repo, updating the index if
    needed, and querying using LanceDB. A new method,
    _update_table_with_issues, has been added to update the
    table with issues when LanceDB is used.
+273/-87
Configuration changes
1 files
configuration.toml                                                                                   
    pr_agent/settings/configuration.toml

    The configuration file has been updated to include LanceDB
    as a vector database option. The URI for LanceDB has also
    been added.
+4/-0
Dependencies
1 files
requirements.txt                                                                                       
    requirements.txt

    LanceDB has been added to the project dependencies.
+1/-0
Miscellaneous
1 files
settings.json                                                                                             
    .vscode/settings.json

    The settings file has been modified, but no significant
    changes have been made.
+2/-0

@mrT23
Copy link
Collaborator

mrT23 commented Dec 12, 2023

Thanks @PrashantDixit0
i will review it tomorrow or on Thursday and send feedback

@mrT23
Copy link
Collaborator

mrT23 commented Dec 12, 2023

/review

@mrT23
Copy link
Collaborator

mrT23 commented Dec 12, 2023

/describe

@github-actions github-actions bot changed the title LanceDB Integration Integration of LanceDB with PR-agent for managing database Dec 12, 2023
@github-actions github-actions bot added the enhancement New feature or request label Dec 12, 2023
Copy link
Contributor

PR Description updated to latest commit (476fad8)

@github-actions github-actions bot added Review effort [1-5]: 4 and removed enhancement New feature or request labels Dec 12, 2023
Copy link
Contributor

PR Analysis

  • 🎯 Main theme: Integration of LanceDB with PR-agent for managing database.
  • 📝 PR summary: This PR introduces LanceDB as a new vector database option for the PR-agent. It includes the necessary configuration changes, updates to the issue processing logic to support both Pinecone and LanceDB, and the addition of LanceDB to the project's dependencies.
  • 📌 Type of PR: Enhancement
  • 🧪 Relevant tests added: No
  • ⏱️ Estimated effort to review [1-5]: 4, because the PR introduces a significant change in the database management system, which requires careful review to ensure that it doesn't break existing functionality or introduce new bugs.
  • 🔒 Security concerns: No security concerns found

PR Feedback

  • 💡 General suggestions: The PR is well structured and the changes are logically grouped. However, it would be beneficial to include tests to verify the new functionality. Also, it is recommended to remove debug print statements from the production code.

  • 🤖 Code feedback:
    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      Consider refactoring the code to avoid duplication. The code for Pinecone and LanceDB is very similar, and it might be beneficial to extract the common parts into separate methods or classes. [important]
    relevant lineelif get_settings().config.vectordb == "lancedb":

    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      It is recommended to handle exceptions more specifically rather than using a broad 'except' clause. This will help in identifying and handling different error scenarios appropriately. [important]
    relevant lineexcept:

    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      Avoid using print statements for debugging in production code. Consider using a logger for this purpose. [medium]
    relevant lineprint("result: ", res)

    relevant filepr_agent/tools/pr_similar_issue.py
    suggestion      Avoid using magic numbers in your code. Consider defining a constant for the number 15 to give it a meaningful name. [medium]
    relevant linetime.sleep(15) # wait for pinecone to finalize indexing before querying

How to use

Instructions

To invoke the PR-Agent, add a comment using one of the following commands:
/review: Request a review of your Pull Request.
/describe: Update the PR title and description based on the contents of the PR.
/improve [--extended]: Suggest code improvements. Extended mode provides a higher quality feedback.
/ask <QUESTION>: Ask a question about the PR.
/update_changelog: Update the changelog based on the PR's contents.
/add_docs: Generate docstring for new components introduced in the PR.
/generate_labels: Generate labels for the PR based on the PR's contents.
see the tools guide for more details.

To edit any configuration parameter from the configuration.toml, add --config_path=new_value.
For example: /review --pr_reviewer.extra_instructions="focus on the file: ..."
To list the possible configuration parameters, add a /config comment.

@PrashantDixit0
Copy link
Contributor Author

ok sure @mrT23

@@ -14,6 +14,7 @@ max_model_tokens = 32000 # Limits the maximum number of tokens that can be used
patch_extra_lines = 3
secret_provider="google_cloud_storage"
cli_mode=false
vectordb = "lancedb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to
[pr_similar_issue]

@@ -0,0 +1,2 @@
{
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this file ? why to you add it ?

@@ -14,6 +14,7 @@ msrest==0.7.1
openai==0.27.8
pinecone-client
pinecone-datasets @ git+https://github.com/mrT23/pinecone-datasets.git@main
lancedb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lock versrion

lancedb==0.3.4

@mrT23
Copy link
Collaborator

mrT23 commented Dec 14, 2023

Hi @PrashantDixit0 . i gave some code comments, but in addition:

  1. update https://github.com/PrashantDixit0/pr-agent/blob/main/docs/SIMILAR_ISSUE.md with an explanation of the different databases, and how to switch between them

  2. I ran the tool on Integration with other vectorDbs [enhancement] #496, and got no similar issues. With vectordb=pinecone I did get suggestions. How do you know the tool works ? i do expect to get similar issues on this.

If there is a threshold to be set, make it configurable. but you must demonstrate that the tool indeed works, and can retrieve reasonable similar issues

  1. change all print statements to get_logger.info(...)

@PrashantDixit0
Copy link
Contributor Author

Sure @mrT23, I'll work on your comments

@PrashantDixit0 PrashantDixit0 closed this by deleting the head repository Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants