Static Analyzer Enhanced with LLMs for Undefined Behavior Sanitization

Overview

This project provides a static analysis pipeline that detects and addresses undefined behavior (UB) in Rust programs. The tool utilizes Large Language Models (LLMs) to augment traditional static analysis, offering suggestions and validating fixes using tools like MIRI for undefined behavior detection.

Process Flow

The analysis process consists of the following steps:

Static Analysis Tool: Use MIRI to check for undefined behavior (UB) in the original Rust program.
LLM Static Analysis: Apply an LLM model to reason about the code and act as an additional static analyzer for discovering undefined behavior.
Comparison of Results: Compare the results from MIRI and the LLM. Create a comparison table to track whether both methods identified the same UB, and document any discrepancies.
Generate LLVM-IR: Use rustc to generate the LLVM Intermediate Representation (LLVM-IR) from the original Rust code.
LLM Suggestion and Application: Ask the LLM to suggest a solution for the identified UB. Apply the suggested solution to a copy of the program and generate a new LLVM-IR using rustc.
Verification with Alive2: Use Alive2 to check whether both LLVM-IR versions are semantically equivalent. If Alive2 finds the change acceptable, the user should choose the modified code to the user; otherwise, reject the change.

Flowchart

Benchmark Description

The benchmark for this project consists of a diverse set of Rust functions designed to test undefined behavior.

Potential consequences of undefined behavior include:

Unexpected Termination: Programs may crash unexpectedly or enter infinite loops.
Incorrect Outputs: Programs may produce invalid or nonsensical results.
Security Vulnerabilities: UB can open applications to security risks and potential exploits.

Through this project, we aim to leverage LLMs to identify and propose fixes for undefined behavior in Rust, thereby enhancing the reliability and security of Rust-based systems.

Cleanup Script

The cleanup_benchs.sh script is provided to clean up results in a specified directory.

To use the cleanup script, run:

bash cleanup_benchs.sh [directory_to_be_cleaned]

Required Dependencies for Running LLM-UBSanitizer

LLVM
Clang and Clang tools
Alive2
MIRI (for Rust UB analysis)

Setting Up the `.env` File

To properly configure the environment for this application, you need to create a .env file in the root directory of your project. This file should contain the following key-value pairs:

API_TYPE="azure"
AZURE_ENDPOINT="your endpoint url"
API_KEY="your azure access token"
API_VERSION="2024-10-21"
SCOPE="api permissions"
GITHUB_TOKEN="your github personal access token"
MODEL="gpt-4o_2024-05-13"
GITHUB_ENDPOINT="https://models.inference.ai.azure.com"
MODEL_NAME="gpt-4o"

Explanation of the Variables

AZURE_ENDPOINT

Purpose: This is the URL of your Azure endpoint where API requests are sent. It specifies the location of the Azure resources your application interacts with.
Example: https://myazureapi.cognitiveservices.azure.com
Ensure you replace "your endpoint url" with the actual endpoint provided by Azure.

API_KEY

Purpose: This is your Azure API access token, used to authenticate requests to the Azure services.
Example: f2h3a8j29... (a long string of characters)
Obtain this token from your Azure portal under the resource's "Keys and Endpoint" section.

GITHUB_TOKEN

Purpose: This is a GitHub personal access token, required if your application interacts with GitHub APIs. It allows secure access to repositories and other GitHub features.
Example: ghp_ab12cd34ef56gh78ij90klmnopqrstu
Create this token via GitHub by navigating to Settings > Developer Settings > Personal Access Tokens. Make sure to grant the required scopes (e.g., repo or read:packages) based on your application's needs.

Other Variables

API_TYPE: Specifies the type of API used, in this case, "azure".
API_VERSION: Indicates the version of the API being used, ensuring compatibility.
SCOPE: Specifies the scope of the API request, often related to permissions.
MODEL: Defines the specific model and version to be used (e.g., gpt-4o_2024-05-13).
GITHUB_ENDPOINT: URL of the GitHub API endpoint being accessed.
MODEL_NAME: A shorthand identifier for the model being utilized (e.g., gpt-4o).

Important Notes

File Security: The .env file is currently not committed to version control since it is added to .gitignore file.
Environment Setup: After creating the .env file, your application will automatically load these configurations at runtime, provided you use an environment variable library (e.g., dotenv for Node.js or Python).

By setting up this file correctly, you'll ensure seamless integration and proper functionality with the analyze_rust.py script.

Docker Instructions

You can pull the Docker image and run it as follows:

Pull the Docker Image

docker pull angelicamoreira/llmubsanitizer:v2

Running the Docker Container

To mount your home directory (allows access to files):

docker run -itd --name=llmubsanitizer --privileged --ipc=host --net=host --gpus=all -w /root --ulimit memlock=-1:-1 -v $HOME:$HOME angelicamoreira/llmubsanitizer:v2 bash

For isolation (without mounting your home directory):

docker run -itd --name=llmubsanitizer --privileged --net=host --ipc=host --gpus=all -w /root -v /mnt:/mnt angelicamoreira/llmubsanitizer:v2 bash

Execute the image

docker exec -it llmubsanitizer /bin/bash

Execution

To start the analysis, clone this repository and run the following command:

python3 analyze_rust.py

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
benchmarks		benchmarks
images		images
.gitignore		.gitignore
README.md		README.md
analyze_rust.py		analyze_rust.py
cleanup_benchs.sh		cleanup_benchs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Static Analyzer Enhanced with LLMs for Undefined Behavior Sanitization

Overview

Process Flow

Flowchart

Benchmark Description

Cleanup Script

Required Dependencies for Running LLM-UBSanitizer

Setting Up the `.env` File

Docker Instructions

Pull the Docker Image

Running the Docker Container

Execute the image

Execution

About

Releases

Packages

Contributors 2

Languages

angelica-moreira/LLM-UBSanitizer

Folders and files

Latest commit

History

Repository files navigation

Static Analyzer Enhanced with LLMs for Undefined Behavior Sanitization

Overview

Process Flow

Flowchart

Benchmark Description

Cleanup Script

Required Dependencies for Running LLM-UBSanitizer

Setting Up the .env File

Docker Instructions

Pull the Docker Image

Running the Docker Container

Execute the image

Execution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Setting Up the `.env` File

Packages