GPT-2 Unraveled 🤖✨

This project focuses on understanding the internal workings of GPT-2, providing a detailed exploration of its embedding space, layers, and processing mechanisms. The repository includes comprehensive code for dissecting these components, along with an analysis report summarizing insights into the geometric and functional behavior of GPT-2.

🔍 Introduction

GPT-2, developed by OpenAI (https://github.com/openai/gpt-2), marked a significant advancement in natural language processing at the time of its release. This project dives deep into its inner workings, particularly focusing on:

The embedding space of the model
The structure and function of attention heads
The role of multi-layer perceptrons (MLPs) in storing knowledge
The effects of layer normalization and its geometric interpretation

For a detailed explanation of the research and methods used, check out the accompanying report for this project: GPT-2 Analysis Report.

⚙️ Installation

To set up the environment, it is recommended to use a virtual environment with Miniconda or virtualenv. Follow these steps to install the necessary dependencies:

Create and activate a virtual environment:

conda create -n gpt2 python=3.12 -y
conda activate gpt2

Install the required dependencies:
```
pip install -r requirements.txt
```

🚀 Usage

The repository contains three scripts for analyzing different aspects of GPT-2. To run the analysis:

Explore GPT-2 Embeddings:
```
python 1_explore_embeddings.py
```
Understanding the Model Layers:
```
python 2_layers_understanding.py
```
Tracking Token Transformations:
```
python 3_tokens_journey.py
```

Each script will output plots and analyses to help you better understand how GPT-2 processes input data.

🔬 Exploration

Embedding Analysis

In the first script, the analysis focuses on understanding how GPT-2's token embeddings and positional embeddings are structured in high-dimensional space. Key findings include:

Systematic offsets in certain dimensions of the embedding space
The discovery of dimensions where position embeddings are active
The dimensionality reduction of token and position embeddings using PCA

Layers and Attention

The second script dives into how the residual stream and attention heads operate within the model. Visualizations are generated for:

Self-attention matrices
Multi-layer perceptron (MLP) behavior
Layer normalization's effect on the high-dimensional space

Token Journey

The third script tracks the evolution of tokens as they pass through the model layers, showing how token representations change layer by layer until they predict the next token.

📄 License

This project is licensed under the MIT License and is free to reuse - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
1_plots		1_plots
2_plots		2_plots
3_plots		3_plots
1_explore_embeddings.py		1_explore_embeddings.py
2_layers_understanding.py		2_layers_understanding.py
3_tokens_journey.py		3_tokens_journey.py
LICENSE		LICENSE
README.md		README.md
gpt2_analysis.pdf		gpt2_analysis.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-2 Unraveled 🤖✨

🔍 Introduction

⚙️ Installation

🚀 Usage

🔬 Exploration

Embedding Analysis

Layers and Attention

Token Journey

📄 License

About

Languages

License

massimilianoviola/gpt2-unraveled

Folders and files

Latest commit

History

Repository files navigation

GPT-2 Unraveled 🤖✨

🔍 Introduction

⚙️ Installation

🚀 Usage

🔬 Exploration

Embedding Analysis

Layers and Attention

Token Journey

📄 License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages