Skip to content

Commit

Permalink
* attacks
Browse files Browse the repository at this point in the history
  • Loading branch information
asofter committed Feb 4, 2024
1 parent 6208385 commit 76fb2e4
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- `Anonymize`: language support with `zh` ([#79](https://github.com/laiyer-ai/llm-guard/pull/79), thanks to [@Oscaner](https://github.com/Oscaner)).
- `Anonymize`: more regex patterns, such as `PO_BOX_RE`, `PRICE_RE`, `HEX_COLOR`, `TIME_RE`, `DATE_RE`, `URL_RE`, `PHONE_NUMBER_WITH_EXT`, `BTC_ADDRESS`
- Add [NIST Taxonomy](./get_started/attacks.md) to the documentation.

### Fixed
- Incorrect results when using `Deanonymize` multiple times ([#82](https://github.com/laiyer-ai/llm-guard/pull/82), thanks to [@andreaponti5](https://github.com/andreaponti5))
Expand Down
47 changes: 47 additions & 0 deletions docs/get_started/attacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Attacks

This section outlines the range of attacks that can be launched against Large Language Models (LLMs) and demonstrates how LLM Guard offers robust protection against these threats.

## NIST Trustworthy and Responsible AI

Following the [NIST Trustworthy and Responsible AI framework](https://doi.org/10.6028/NIST.AI.100-2e2023), attacks on Generative AI systems, including LLMs, can be broadly categorized into four types.
LLM Guard is designed to counteract each category effectively:

### 1. Availability Breakdowns

Attacks targeting the availability of LLMs aim to disrupt their normal operations. Methods such as Denial of Service (DoS) attacks are common. LLM Guard combats these through:

- [TokenLimit Input](../input_scanners/token_limit.md)
- ...

### 2. Integrity Violations

These attacks attempt to undermine the integrity of LLMs, often by injecting malicious prompts. LLM Guard safeguards integrity through various scanners, including:

- [Prompt Injection](../input_scanners/prompt_injection.md)
- Language [Input](../input_scanners/language.md) & [Output](../output_scanners/language.md)
- [Language Same](../output_scanners/language_same.md)
- [Relevance Output](../output_scanners/relevance.md)
- [Factual Consistency Output](../output_scanners/factual_consistency.md)
- Ban Topics [Input](../input_scanners/ban_topics.md) & [Output](../output_scanners/ban_topics.md)
- ...

### 3. Privacy Compromise

These attacks seek to compromise privacy by extracting sensitive information from LLMs. LLM Guard protects privacy through:

- [Anonymize Input](../input_scanners/anonymize.md)
- [Sensitive Output](../output_scanners/sensitive.md)
- [Secrets Input](../input_scanners/secrets.md)
- ...

### 4. Abuse

Abuse attacks involve the generation of harmful content using LLMs. LLM Guard mitigates these risks through:

- [Bias Output](../output_scanners/bias.md)
- Toxicity [Input](../input_scanners/toxicity.md) & [Output](../output_scanners/toxicity.md)
- Ban Competitors [Input](../input_scanners/ban_competitors.md) & [Output](../output_scanners/ban_competitors.md)
- ...

LLM Guard's suite of scanners comprehensively addresses each category of attack, providing a multi-layered defense mechanism to ensure the safe and responsible use of LLMs.

0 comments on commit 76fb2e4

Please sign in to comment.