Skip to content

angelica-moreira/LLM-UBSanitizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Static Analyzer Enhanced with LLMs for Undefined Behavior Sanitization

Overview

This project provides a static analysis pipeline that detects and addresses undefined behavior (UB) in Rust programs. The tool utilizes Large Language Models (LLMs) to augment traditional static analysis, offering suggestions and validating fixes using tools like MIRI for undefined behavior detection.

Process Flow

The analysis process consists of the following steps:

  1. Static Analysis Tool: Use MIRI to check for undefined behavior (UB) in the original Rust program.
  2. LLM Static Analysis: Apply an LLM model to reason about the code and act as an additional static analyzer for discovering undefined behavior.
  3. Comparison of Results: Compare the results from MIRI and the LLM. Create a comparison table to track whether both methods identified the same UB, and document any discrepancies.
  4. Generate LLVM-IR: Use rustc to generate the LLVM Intermediate Representation (LLVM-IR) from the original Rust code.
  5. LLM Suggestion and Application: Ask the LLM to suggest a solution for the identified UB. Apply the suggested solution to a copy of the program and generate a new LLVM-IR using rustc.
  6. Verification with Alive2: Use Alive2 to check whether both LLVM-IR versions are semantically equivalent. If Alive2 finds the change acceptable, the user should choose the modified code to the user; otherwise, reject the change.

Flowchart

Process Flowchart

Benchmark Description

The benchmark for this project consists of a diverse set of Rust functions designed to test undefined behavior.

Potential consequences of undefined behavior include:

  • Unexpected Termination: Programs may crash unexpectedly or enter infinite loops.
  • Incorrect Outputs: Programs may produce invalid or nonsensical results.
  • Security Vulnerabilities: UB can open applications to security risks and potential exploits.

Through this project, we aim to leverage LLMs to identify and propose fixes for undefined behavior in Rust, thereby enhancing the reliability and security of Rust-based systems.

Cleanup Script

The cleanup_benchs.sh script is provided to clean up results in a specified directory.

To use the cleanup script, run:

bash cleanup_benchs.sh [directory_to_be_cleaned]

Required Dependencies for Running LLM-UBSanitizer

Setting Up the .env File

To properly configure the environment for this application, you need to create a .env file in the root directory of your project. This file should contain the following key-value pairs:

API_TYPE="azure"
AZURE_ENDPOINT="your endpoint url"
API_KEY="your azure access token"
API_VERSION="2024-10-21"
SCOPE="api permissions"
GITHUB_TOKEN="your github personal access token"
MODEL="gpt-4o_2024-05-13"
GITHUB_ENDPOINT="https://models.inference.ai.azure.com"
MODEL_NAME="gpt-4o"

Explanation of the Variables

  1. AZURE_ENDPOINT
  • Purpose: This is the URL of your Azure endpoint where API requests are sent. It specifies the location of the Azure resources your application interacts with.
  • Example: https://myazureapi.cognitiveservices.azure.com
  • Ensure you replace "your endpoint url" with the actual endpoint provided by Azure.
  1. API_KEY
  • Purpose: This is your Azure API access token, used to authenticate requests to the Azure services.
  • Example: f2h3a8j29... (a long string of characters)
  • Obtain this token from your Azure portal under the resource's "Keys and Endpoint" section.
  1. GITHUB_TOKEN
  • Purpose: This is a GitHub personal access token, required if your application interacts with GitHub APIs. It allows secure access to repositories and other GitHub features.
  • Example: ghp_ab12cd34ef56gh78ij90klmnopqrstu
  • Create this token via GitHub by navigating to Settings > Developer Settings > Personal Access Tokens. Make sure to grant the required scopes (e.g., repo or read:packages) based on your application's needs.

Other Variables

  • API_TYPE: Specifies the type of API used, in this case, "azure".
  • API_VERSION: Indicates the version of the API being used, ensuring compatibility.
  • SCOPE: Specifies the scope of the API request, often related to permissions.
  • MODEL: Defines the specific model and version to be used (e.g., gpt-4o_2024-05-13).
  • GITHUB_ENDPOINT: URL of the GitHub API endpoint being accessed.
  • MODEL_NAME: A shorthand identifier for the model being utilized (e.g., gpt-4o).

Important Notes

  • File Security: The .env file is currently not committed to version control since it is added to .gitignore file.
  • Environment Setup: After creating the .env file, your application will automatically load these configurations at runtime, provided you use an environment variable library (e.g., dotenv for Node.js or Python).

By setting up this file correctly, you'll ensure seamless integration and proper functionality with the analyze_rust.py script.

Docker Instructions

You can pull the Docker image and run it as follows:

Pull the Docker Image

docker pull angelicamoreira/llmubsanitizer:v2

Running the Docker Container

  • To mount your home directory (allows access to files):

    docker run -itd --name=llmubsanitizer --privileged --ipc=host --net=host --gpus=all -w /root --ulimit memlock=-1:-1 -v $HOME:$HOME angelicamoreira/llmubsanitizer:v2 bash
  • For isolation (without mounting your home directory):

    docker run -itd --name=llmubsanitizer --privileged --net=host --ipc=host --gpus=all -w /root -v /mnt:/mnt angelicamoreira/llmubsanitizer:v2 bash

Execute the image

docker exec -it llmubsanitizer /bin/bash

Execution

To start the analysis, clone this repository and run the following command:

python3 analyze_rust.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published