Code Whisperer

This repo generates documentation and insights from GitHub repositories. It uses a Streamlit interface to interact with the user, allowing them to input a GitHub repository and process specific file types.

The application extracts data from the repository, vectorizes it using AstraDB, and generates relevant documentation with the help of an OpenAI language model.

Key functionalities include displaying repository data, generating an overview, architectural summary, domain model, and enabling a chat interface for code-related questions. The system uses asynchronous programming to manage tasks efficiently.

To proof the point, the following documentation has been auto-generated by the app!

Architectural summary

The application, termed Code Whisperer, is a comprehensive tool designed to automate the creation of documentation and provide insights into code stored within a GitHub repository.

The architectural overview can be delineated as follows:

Frontend Layer

Streamlit: The application utilizes Streamlit as its primary interface. Streamlit serves as the web application framework that orchestrates user interactions, organizes the interface into various tabs, and provides immediate feedback to users. The main tabs include "Repository data," "Overview," "Architectural summary," "Domain model," and "Chat with your code."

Backend Layer

RepoReader Module: This encapsulates the logic required to fetch data from GitHub repositories. Upon initializing with a GitHub token, it connects to the GitHub API, retrieves repository contents, and filters files based on specified extensions. AstraDB: Leveraging AstraDB's scalable database capabilities, the application stores vector embeddings of repository data. This is achieved using the DataAPIClient to create and manage collections that store vectorized representations of file contents. Vectorization and Storage:

Vectorization: The application relies on text-embedding-ada-002 from OpenAI to transform repository contents into vector embeddings, utilizing cosine similarity metrics for efficient storage and retrieval.

AstraDB Integration: Data from the repository is vectorized and inserted into AstraDB collections for persistent storage. This includes metadata about the repository and the contents of its files.

LLM Integration

OpenAI Integration: OpenAI’s Large Language Models (LLMs), specifically GPT-4, are employed to generate code documentation. The LLMs interact with vectorized content stored within AstraDB to extract and summarize relevant information about the repository and its codebase.

Asynchronous Operations

Asyncio and AsyncOpenAI: Asynchronous capabilities are provided by Python’s asyncio, ensuring non-blocking operations for tasks such as fetching repository contents, generating documentation, and querying LLMs. Session Management:

Streamlit Session State: Streamlit's session state mechanism is utilized to maintain the application's state across various user interactions, such as loading repository data and storing generated documentation.

Workflow Summary

The user inputs GitHub details via the Streamlit interface.
The application reads and vectorizes repository data using RepoReader.
Vectorized data is stored in AstraDB.
User interacts with provided tabs to view repository details, architectural summaries, domain models, and to chat with the code.
OpenAI LLMs are queried to generate documentation based on context retrieved from AstraDB.
The results are presented to the user through the Streamlit interface.
This architecture ensures a seamless, interactive experience by integrating various technologies for data retrieval, storage, processing, and presentation

Conceptual Domain Model

Code Whisperer Application

Attributes:
- None (Primary entity)
Methods:
- generateDocumentation(): Asynchronously generate documentation based on the repository.
- load_sidebar(): Load the repository data and settings from the sidebar.
- show_repository_data(): Display the repository data on the first tab.
- show_overview(): Display the overview on the second tab.
- show_architectural_summary(): Display the architectural summary on the third tab.
- show_domain_model(): Display the domain model on the fourth tab.
- show_chat(): Display the chat interface on the fifth tab.

RepoReader

Attributes:
- github_token: string
- github_repo_name: string
- github_handle: object
- github_repo: object
- extensions: tuple (file extensions to process)
Methods:
- connect(token: str): Connect to the GitHub API using a token.
- setRepository(repo: str): Set the current repository to work with.
- getRepositoryContents(): Retrieve the contents of the repository.
- getRepositoryContent(file_path: str): Retrieve the contents of a specific file.
- getName(): Get the repository name.
- getTopics(): Get the topics of the repository.
- getStars(): Get the star count of the repository.
- setExtensions(extensions: str): Set file extensions to process.

UML Class Diagram

+----------------------------+
| Code Whisperer Application |
+----------------------------+
| -                          |
+----------------------------+
| + generateDocumentation()  |
| + load_sidebar()           |
| + show_repository_data()   |
| + show_overview()          |
| + show_architectural_summary() |
| + show_domain_model()      |
| + show_chat()              |
+----------------------------+

              | uses
              v

+----------------------------+
|       RepoReader           |
+----------------------------+
| - github_token: string     |
| - github_repo_name: string |
| - github_handle: object    |
| - github_repo: object      |
| - extensions: tuple        |
+----------------------------+
| + connect(token: str)      |
| + setRepository(repo: str) |
| + getRepositoryContents()  |
| + getRepositoryContent     |
| + getName()                |
| + getTopics()              |
| + getStars()               |
| + setExtensions()          |
+----------------------------+

              | uses
              v

+----------------------------+
|        DataAPIClient       |
+----------------------------+
| -                          |
+----------------------------+
| + get_database_by_api_endpoint(endpoint: str) |
+----------------------------+

              | uses
              v

+------------+---------+
|   OpenAI              |
+-----------------------+
| - api_key: string     |
+-----------------------+
| + chat.completions.create() |
+----------------------------+

              | interacts
              v

+-------------------------+
|       Collection        |
+-------------------------+
| -                       |
+-------------------------+
| + insert_one(context: dict)      |
| + delete_many(criteria: dict)    |
+-------------------------+

Start the app

Create a Python environment

In case you want to run all of the above locally, it's useful to create a Virtual Environment. Use the below to set it up:

python3.10 -m venv myenv

Then activate it as follows:

source myenv/bin/activate   # on Linux/Mac
myenv\Scripts\activate.bat  # on Windows

Now you can start installing packages:

pip3 install -r requirements.txt

Install dependencies

pip3 install -r requirements.txt

Run it

streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.streamlit		.streamlit
assets		assets
.gitignore		.gitignore
README.md		README.md
app.py		app.py
attributes.py		attributes.py
reporeader.py		reporeader.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Whisperer

Architectural summary

Frontend Layer

Backend Layer

LLM Integration

Asynchronous Operations

Workflow Summary

Conceptual Domain Model

UML Class Diagram

Start the app

Create a Python environment

Install dependencies

Run it

About

Releases

Packages

Contributors 2

Languages

michelderu/code-whisperer

Folders and files

Latest commit

History

Repository files navigation

Code Whisperer

Architectural summary

Frontend Layer

Backend Layer

LLM Integration

Asynchronous Operations

Workflow Summary

Conceptual Domain Model

UML Class Diagram

Start the app

Create a Python environment

Install dependencies

Run it

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages