This project provides a static analysis pipeline that detects and addresses undefined behavior (UB) in Rust programs. The tool utilizes Large Language Models (LLMs) to augment traditional static analysis, offering suggestions and validating fixes using tools like MIRI for undefined behavior detection.
The analysis process consists of the following steps:
- Static Analysis Tool: Use MIRI to check for undefined behavior (UB) in the original Rust program.
- LLM Static Analysis: Apply an LLM model to reason about the code and act as an additional static analyzer for discovering undefined behavior.
- Comparison of Results: Compare the results from MIRI and the LLM. Create a comparison table to track whether both methods identified the same UB, and document any discrepancies.
- Generate LLVM-IR: Use
rustc
to generate the LLVM Intermediate Representation (LLVM-IR) from the original Rust code. - LLM Suggestion and Application: Ask the LLM to suggest a solution for the identified UB. Apply the suggested solution to a copy of the program and generate a new LLVM-IR using
rustc
. - Verification with Alive2: Use Alive2 to check whether both LLVM-IR versions are semantically equivalent. If Alive2 finds the change acceptable, the user should choose the modified code to the user; otherwise, reject the change.
The benchmark for this project consists of a diverse set of Rust functions designed to test undefined behavior.
Potential consequences of undefined behavior include:
- Unexpected Termination: Programs may crash unexpectedly or enter infinite loops.
- Incorrect Outputs: Programs may produce invalid or nonsensical results.
- Security Vulnerabilities: UB can open applications to security risks and potential exploits.
Through this project, we aim to leverage LLMs to identify and propose fixes for undefined behavior in Rust, thereby enhancing the reliability and security of Rust-based systems.
The cleanup_benchs.sh
script is provided to clean up results in a specified directory.
To use the cleanup script, run:
bash cleanup_benchs.sh [directory_to_be_cleaned]
- LLVM
- Clang and Clang tools
- Alive2
- MIRI (for Rust UB analysis)
To properly configure the environment for this application, you need to create a .env
file in the root directory of your project. This file should contain the following key-value pairs:
API_TYPE="azure"
AZURE_ENDPOINT="your endpoint url"
API_KEY="your azure access token"
API_VERSION="2024-10-21"
SCOPE="api permissions"
GITHUB_TOKEN="your github personal access token"
MODEL="gpt-4o_2024-05-13"
GITHUB_ENDPOINT="https://models.inference.ai.azure.com"
MODEL_NAME="gpt-4o"
Explanation of the Variables
- AZURE_ENDPOINT
- Purpose: This is the URL of your Azure endpoint where API requests are sent. It specifies the location of the Azure resources your application interacts with.
- Example: https://myazureapi.cognitiveservices.azure.com
- Ensure you replace "your endpoint url" with the actual endpoint provided by Azure.
- API_KEY
- Purpose: This is your Azure API access token, used to authenticate requests to the Azure services.
- Example: f2h3a8j29... (a long string of characters)
- Obtain this token from your Azure portal under the resource's "Keys and Endpoint" section.
- GITHUB_TOKEN
- Purpose: This is a GitHub personal access token, required if your application interacts with GitHub APIs. It allows secure access to repositories and other GitHub features.
- Example: ghp_ab12cd34ef56gh78ij90klmnopqrstu
- Create this token via GitHub by navigating to Settings > Developer Settings > Personal Access Tokens. Make sure to grant the required scopes (e.g., repo or read:packages) based on your application's needs.
Other Variables
- API_TYPE: Specifies the type of API used, in this case, "azure".
- API_VERSION: Indicates the version of the API being used, ensuring compatibility.
- SCOPE: Specifies the scope of the API request, often related to permissions.
- MODEL: Defines the specific model and version to be used (e.g., gpt-4o_2024-05-13).
- GITHUB_ENDPOINT: URL of the GitHub API endpoint being accessed.
- MODEL_NAME: A shorthand identifier for the model being utilized (e.g., gpt-4o).
Important Notes
- File Security: The
.env
file is currently not committed to version control since it is added to.gitignore
file. - Environment Setup: After creating the
.env
file, your application will automatically load these configurations at runtime, provided you use an environment variable library (e.g., dotenv for Node.js or Python).
By setting up this file correctly, you'll ensure seamless integration and proper functionality with the analyze_rust.py script.
You can pull the Docker image and run it as follows:
docker pull angelicamoreira/llmubsanitizer:v2
-
To mount your home directory (allows access to files):
docker run -itd --name=llmubsanitizer --privileged --ipc=host --net=host --gpus=all -w /root --ulimit memlock=-1:-1 -v $HOME:$HOME angelicamoreira/llmubsanitizer:v2 bash
-
For isolation (without mounting your home directory):
docker run -itd --name=llmubsanitizer --privileged --net=host --ipc=host --gpus=all -w /root -v /mnt:/mnt angelicamoreira/llmubsanitizer:v2 bash
docker exec -it llmubsanitizer /bin/bash
To start the analysis, clone this repository and run the following command:
python3 analyze_rust.py