Semgrep-AI is an AI-powered code analysis tool that enhances Semgrep’s static analysis by providing contextual validation of findings using a local LLM (Large Language Model). It helps you gain deeper insights into vulnerabilities, offering explanations, exploitability assessments, confidence ratings, and mitigation tips. 📊🛡️
- Validates findings with code context (not only just grep and display)for more accurate results 📂
- Outputs detailed reports with confidence scores and exploitability insights ✅
- Supports custom formatting for results in CSV or other file formats 📝 (To-Do)
First, ensure you have Ollama installed and any local LLM model of your choice. Ollama makes it easy to run AI models locally. 💻
Before running Semgrep-AI, modify the following fields in the semgrep-ai.py
file:
- Local LLM endpoint: Point it to the endpoint where your LLM is running.
- Source code path: The directory where your project's source code is located.
- Semgrep output path: The path where Semgrep will store its analysis output in
.sarif
format.
Execute Semgrep on your codebase and generate the output in SARIF format. You can do this by adding the --sarif
flag when running Semgrep:
semgrep --config <your-semgrep-config> --sarif > output.sarif
After Semgrep produces the SARIF output, run semgrep-ai.py
:
python3 semgrep-ai.py
- The script extracts the vulnerable code snippet, file path, and line number from the
output.sarif
file.
- It uses the file path to locate the vulnerable file in the source code, gathering additional context around the vulnerable snippet.
- The vulnerable code snippet, file path, and line number are sent to the local LLM for further analysis.
- The model analyzes the context and provides detailed insights based on a structured prompt.
The LLM provides insights in the following format:
- Vulnerability Name: The type of vulnerability found (e.g., XSS, SQL Injection).
- Vulnerable Code: The code snippet that is identified as vulnerable.
- How it can be Exploited: A description of how an attacker could exploit the vulnerability.
- Rate My Confidence: A confidence score indicating how sure the AI is about the exploitability of the vulnerability.
- Other Comments: Suggestions for mitigation or other important comments.
- The results from the LLM analysis are saved in a CSV file, making it easy to integrate with other tools or workflows.
Feel free to adapt the result format as needed! The output can be converted into any file format and schema that suits your project.
You can find the project repository at ai-secure-code-review by @247arjun