Sparse Autoencoder Word Embedding Analysis

This tool helps analyze interpretable features in word embeddings using sparse autoencoders. It provides an interactive interface to explore how different words activate specific features and visualize semantic relationships.

Overview

The tool works by:

Taking pre-trained word embeddings (BERT or GloVe)
Training a sparse autoencoder to find interpretable features
Analyzing which features respond to specific semantic concepts
Visualizing the relationships between words and features in 3D space

Requirements

Python 3.8+
PyTorch
Streamlit
Transformers (for BERT)
Gensim
UMAP-learn
Plotly
NumPy
Pandas

Install dependencies using:

pip install -r requirements.txt

Usage

Run the application:

streamlit run app.py

Configure your analysis in the sidebar:
- Select an embedding model (BERT or GloVe)
- Enter words to analyze
- Define concept groups
- Adjust autoencoder settings
- Configure visualization parameters
Click "Run Analysis" to:
- Load the selected embedding model
- Train the sparse autoencoder
- Select features
- Visualize the results
Explore the results:
- View the 3D visualization of word relationships
- Select features to analyze their activation patterns
- Examine which words strongly activate each feature

Features

Embedding Models

BERT (768 dimensions): Higher quality, slower
GloVe (100 dimensions): Faster, less detailed

Analysis Tools

Sparse autoencoder for finding interpretable features
Monosemanticity analysis
Interactive 3D visualization
Feature activation analysis

Visualization

UMAP dimensionality reduction
Color-coding by concept groups or feature activation
Interactive 3D plots with zoom and rotation

File Structure

app.py: Main application and UI
models.py: Sparse autoencoder implementation
visualization.py: Visualization functions
data.py: Data loading and processing
requirements.txt: Project dependencies

How It Works

Word Embeddings: The tool starts with pre-trained word embeddings that capture semantic meaning.
Sparse Autoencoder:
- Compresses the embeddings into a smaller number of features
- Uses sparsity to encourage interpretable features
- Each feature potentially captures a specific semantic concept
Monosemanticity Analysis:
- Identifies features that respond strongly to specific concept groups
- Higher scores indicate more selective features
Visualization:
- Uses UMAP to create a 3D visualization
- Preserves both local and global semantic relationships
- Colors indicate either concept groups or feature activation strength

Tips for Best Results

Word Selection:
- Choose related words that you want to analyze
- Include enough examples of each concept
- Mix different semantic categories
Concept Groups:
- Create meaningful semantic categories
- Include enough words in each group
- Make groups that could have semantic features
Parameters:
- Adjust the feature size based on complexity
- Increase epochs for better training
- Tune the monosemanticity threshold to find meaningful features

Limitations

Limited to words in the embedding model's vocabulary
Quality depends on the input embeddings
May require parameter tuning for best results
Computationally intensive for large vocabularies

Future Work

Scrutinize training loop
Add more embedding models for comparison
Improve UMAP inference and interpretability
Map and visualize features in 3D space across layers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sparse Autoencoder Word Embedding Analysis

Overview

Requirements

Usage

Features

Embedding Models

Analysis Tools

Visualization

File Structure

How It Works

Tips for Best Results

Limitations

Future Work

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sparse Autoencoder Word Embedding Analysis

Overview

Requirements

Usage

Features

Embedding Models

Analysis Tools

Visualization

File Structure

How It Works

Tips for Best Results

Limitations

Future Work