Skip to content

Latest commit

 

History

History
129 lines (97 loc) · 3.61 KB

README.md

File metadata and controls

129 lines (97 loc) · 3.61 KB

Sparse Autoencoder Word Embedding Analysis

This tool helps analyze interpretable features in word embeddings using sparse autoencoders. It provides an interactive interface to explore how different words activate specific features and visualize semantic relationships.

Overview

The tool works by:

  1. Taking pre-trained word embeddings (BERT or GloVe)
  2. Training a sparse autoencoder to find interpretable features
  3. Analyzing which features respond to specific semantic concepts
  4. Visualizing the relationships between words and features in 3D space

Requirements

  • Python 3.8+
  • PyTorch
  • Streamlit
  • Transformers (for BERT)
  • Gensim
  • UMAP-learn
  • Plotly
  • NumPy
  • Pandas

Install dependencies using:

pip install -r requirements.txt

Usage

  1. Run the application:
streamlit run app.py
  1. Configure your analysis in the sidebar:

    • Select an embedding model (BERT or GloVe)
    • Enter words to analyze
    • Define concept groups
    • Adjust autoencoder settings
    • Configure visualization parameters
  2. Click "Run Analysis" to:

    • Load the selected embedding model
    • Train the sparse autoencoder
    • Select features
    • Visualize the results
  3. Explore the results:

    • View the 3D visualization of word relationships
    • Select features to analyze their activation patterns
    • Examine which words strongly activate each feature

Features

Embedding Models

  • BERT (768 dimensions): Higher quality, slower
  • GloVe (100 dimensions): Faster, less detailed

Analysis Tools

  • Sparse autoencoder for finding interpretable features
  • Monosemanticity analysis
  • Interactive 3D visualization
  • Feature activation analysis

Visualization

  • UMAP dimensionality reduction
  • Color-coding by concept groups or feature activation
  • Interactive 3D plots with zoom and rotation

File Structure

  • app.py: Main application and UI
  • models.py: Sparse autoencoder implementation
  • visualization.py: Visualization functions
  • data.py: Data loading and processing
  • requirements.txt: Project dependencies

How It Works

  1. Word Embeddings: The tool starts with pre-trained word embeddings that capture semantic meaning.

  2. Sparse Autoencoder:

    • Compresses the embeddings into a smaller number of features
    • Uses sparsity to encourage interpretable features
    • Each feature potentially captures a specific semantic concept
  3. Monosemanticity Analysis:

    • Identifies features that respond strongly to specific concept groups
    • Higher scores indicate more selective features
  4. Visualization:

    • Uses UMAP to create a 3D visualization
    • Preserves both local and global semantic relationships
    • Colors indicate either concept groups or feature activation strength

Tips for Best Results

  1. Word Selection:

    • Choose related words that you want to analyze
    • Include enough examples of each concept
    • Mix different semantic categories
  2. Concept Groups:

    • Create meaningful semantic categories
    • Include enough words in each group
    • Make groups that could have semantic features
  3. Parameters:

    • Adjust the feature size based on complexity
    • Increase epochs for better training
    • Tune the monosemanticity threshold to find meaningful features

Limitations

  • Limited to words in the embedding model's vocabulary
  • Quality depends on the input embeddings
  • May require parameter tuning for best results
  • Computationally intensive for large vocabularies

Future Work

  • Scrutinize training loop
  • Add more embedding models for comparison
  • Improve UMAP inference and interpretability
  • Map and visualize features in 3D space across layers