Adversarial Attacks On Language Models

This repository contains code for training a GPT-2-based transformer model for sentiment classification and generating adversarial examples. It utilizes FastAPI for creating a web API for testing.

Files Overview

`app.py`

This file defines a FastAPI application with endpoints for text classification inference and generating adversarial examples.

`attacker_model_inference.py`

load_target_model()

Loads the target sequence classification model trained on the IMDb dataset.

load_attacker_model()

Loads the GPT-2 language model used as the attacker model.

process_dataset(tokenizer, text, max_length)

Preprocesses the input text using the provided tokenizer and returns the processed dataset.

tokenizer: The tokenizer object used for tokenization.
text: The input text to be preprocessed.
max_length: The maximum length of the input text.

log_perplexity(logits, coeffs)

Calculates the log perplexity of the logits.

logits: The logits produced by the model.
coeffs: The coefficients for calculating log perplexity.

calculate_word_error_rate(reference, hypothesis)

Calculates the Word Error Rate (WER) between reference and hypothesis sequences.

reference: The reference sequence.
hypothesis: The hypothesis sequence.

load_embeddings(tokenizer, device, target_model, attacker_model)

Loads embeddings for both the target and attacker models.

tokenizer: The tokenizer object used for tokenization.
device: The device (e.g., 'cuda' or 'cpu') on which the embeddings will be loaded.
target_model: The pre-trained target model.
attacker_model: The pre-trained attacker model.

train(target_model, attacker_model, input_ids, label, am_embeddings, tm_embeddings, device, tokenizer)

Trains the attacker model to generate adversarial examples.

target_model: The pre-trained target model.
attacker_model: The pre-trained attacker model.
input_ids: The input IDs of the text sequence.
label: The label of the input text sequence.
am_embeddings: The embeddings of the attacker model.
tm_embeddings: The embeddings of the target model.
device: The device (e.g., 'cuda' or 'cpu') on which the training will be performed.
tokenizer: The tokenizer object used for tokenization.

`attacker_model.py` - Training/ experimentation script

load_target_model()

Loads the target sequence classification model trained on the IMDb dataset.

load_attacker_model()

Loads the GPT-2 language model used as the attacker model.

process_dataset(dataset, max_length=256)

Processes the dataset for model input using the GPT-2 tokenizer.

dataset: The dataset to be processed.
max_length: The maximum length of the input sequences (default is 256).

log_perplexity(logits, coeffs)

Calculates the log perplexity of the logits.

logits: The logits produced by the model.
coeffs: The coefficients for calculating log perplexity.

calculate_word_error_rate(reference, hypothesis)

Calculates the Word Error Rate (WER) between reference and hypothesis sequences.

reference: The reference sequence.
hypothesis: The hypothesis sequence.

load_embeddings()

Loads embeddings for both the target and attacker models.

train(target_model, attacker_model, inputs, labels, am_embeddings, tm_embeddings)

Trains the attacker model to generate adversarial examples.

target_model: The pre-trained target model.
attacker_model: The pre-trained attacker model.
inputs: The input sequences.
labels: The corresponding labels for the input sequences.
am_embeddings: The embeddings of the attacker model.
tm_embeddings: The embeddings of the target model.

`classification_inference.py`

PositionalEncoding

Implements the positional encoding for transformer models.

Parameters

d_model: The dimensionality of the model.
max_len: The maximum length of the input sequence (default is 5000).

Methods

__init__(self, d_model, max_len=5000): Initializes the PositionalEncoding module.
forward(self, x): Adds positional encoding to the input tensor.

TransformerDecoder

Implements the decoder part of a transformer model.

Parameters

vocab_size: The size of the vocabulary.
embed_dim: The dimensionality of the embeddings (default is 32).
num_heads: The number of attention heads (default is 2).
hidden_dim: The dimensionality of the hidden layer (default is 128).
num_layers: The number of decoder layers (default is 1).
dropout: The dropout probability (default is 0.1).

Methods

__init__(self, vocab_size, embed_dim=32, num_heads=2, hidden_dim=128, num_layers=1, dropout=0.1): Initializes the TransformerDecoder module.
init_weights(self): Initializes the weights of the model.
forward(self, input_ids): Performs forward pass through the decoder.

init_model

Initializes the transformer decoder model.

Parameters

tokenizer: The tokenizer object.
device: The device to use for computation.

Returns

model: The initialized transformer decoder model.

inference

Performs inference using the transformer decoder model.

Parameters

model: The transformer decoder model.
tokenizer: The tokenizer object.
device: The device to use for computation.
text: The input text for inference.

Returns

output: The output probability.

`classification_model.py` - Training/ experimentation script

IMDbDataset

Custom dataset class for loading the IMDb dataset.

Parameters

dataset: The dataset object loaded using load_dataset.
tokenizer: The GPT2 tokenizer object.
max_length: The maximum length of input sequences (default is 384).

Methods

__init__(self, dataset, tokenizer, max_length=384): Initializes the IMDbDataset class.
__len__(self): Returns the length of the dataset.
__getitem__(self, idx): Returns a single sample from the dataset.

PositionalEncoding

Implements the positional encoding for transformer models.

Parameters

d_model: The dimensionality of the model.
max_len: The maximum length of the input sequence (default is 5000).

Methods

__init__(self, d_model, max_len=5000): Initializes the PositionalEncoding module.
forward(self, x): Adds positional encoding to the input tensor.

TransformerDecoder

Implements the decoder part of a transformer model.

Parameters

vocab_size: The size of the vocabulary.
embed_dim: The dimensionality of the embeddings (default is 768).
num_heads: The number of attention heads (default is 2).
hidden_dim: The dimensionality of the hidden layer (default is 2048).
num_layers: The number of decoder layers (default is 2).
dropout: The dropout probability (default is 0.1).

Methods

__init__(self, vocab_size, embed_dim=768, num_heads=2, hidden_dim=2048, num_layers=2, dropout=0.1): Initializes the TransformerDecoder module.
init_weights(self): Initializes the weights of the model.
forward(self, input_ids, input_embeds=None): Performs forward pass through the decoder.

train

Function to train the TransformerDecoder model.

Parameters

model: The TransformerDecoder model.
data_loader: The DataLoader object for loading data.
epochs: The number of training epochs (default is 40).

main

Entry point of the script.

Calls

train(model, data_loader): Trains the TransformerDecoder model using the provided DataLoader.

Usage

Clone the Repository:

git clone https://github.com/prajwalvathreya/LM_adversarial_attacks.git

Install Dependencies:
```
pip install -r requirements.txt
```
Start FastAPI Server:
```
uvicorn app:app --reload
```
Use /docs for testing the API:
```
http://YOUR_PORT_URL/docs
```

References

Author(s): Chuan Gu, Alexandre Sablayrolles, Hervé Jégou, Douwe Kiela : Facebook AI Research
Title: "Gradient-based Adversarial Attacks against Text Transformers"
Link to Paper
FastAPI Documentation
Transformers Documentation
IMDB Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
model_states		model_states
.gitignore		.gitignore
README.md		README.md
app.py		app.py
attacker_model.py		attacker_model.py
attacker_model_inference.py		attacker_model_inference.py
classification_inference.py		classification_inference.py
classification_model.py		classification_model.py
requirements.txt		requirements.txt
requirements_cuda.txt		requirements_cuda.txt

prajwalvathreya/LM_adversarial_attacks

Folders and files

Latest commit

History

Repository files navigation

Adversarial Attacks On Language Models

Files Overview

app.py

attacker_model_inference.py

load_target_model()

load_attacker_model()

process_dataset(tokenizer, text, max_length)

log_perplexity(logits, coeffs)

calculate_word_error_rate(reference, hypothesis)

load_embeddings(tokenizer, device, target_model, attacker_model)

train(target_model, attacker_model, input_ids, label, am_embeddings, tm_embeddings, device, tokenizer)

attacker_model.py - Training/ experimentation script

load_target_model()

load_attacker_model()

process_dataset(dataset, max_length=256)

log_perplexity(logits, coeffs)

calculate_word_error_rate(reference, hypothesis)

load_embeddings()

train(target_model, attacker_model, inputs, labels, am_embeddings, tm_embeddings)

classification_inference.py

PositionalEncoding

Parameters

Methods

TransformerDecoder

Parameters

Methods

init_model

Parameters

Returns

inference

Parameters

Returns

classification_model.py - Training/ experimentation script

IMDbDataset

Parameters

Methods

PositionalEncoding

Parameters

Methods

TransformerDecoder

Parameters

Methods

train

Parameters

main

Calls

Usage

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`app.py`

`attacker_model_inference.py`

`attacker_model.py` - Training/ experimentation script

`classification_inference.py`

`classification_model.py` - Training/ experimentation script

Packages