This tool helps analyze interpretable features in word embeddings using sparse autoencoders. It provides an interactive interface to explore how different words activate specific features and visualize semantic relationships.
The tool works by:
- Taking pre-trained word embeddings (BERT or GloVe)
- Training a sparse autoencoder to find interpretable features
- Analyzing which features respond to specific semantic concepts
- Visualizing the relationships between words and features in 3D space
- Python 3.8+
- PyTorch
- Streamlit
- Transformers (for BERT)
- Gensim
- UMAP-learn
- Plotly
- NumPy
- Pandas
Install dependencies using:
pip install -r requirements.txt
- Run the application:
streamlit run app.py
-
Configure your analysis in the sidebar:
- Select an embedding model (BERT or GloVe)
- Enter words to analyze
- Define concept groups
- Adjust autoencoder settings
- Configure visualization parameters
-
Click "Run Analysis" to:
- Load the selected embedding model
- Train the sparse autoencoder
- Select features
- Visualize the results
-
Explore the results:
- View the 3D visualization of word relationships
- Select features to analyze their activation patterns
- Examine which words strongly activate each feature
- BERT (768 dimensions): Higher quality, slower
- GloVe (100 dimensions): Faster, less detailed
- Sparse autoencoder for finding interpretable features
- Monosemanticity analysis
- Interactive 3D visualization
- Feature activation analysis
- UMAP dimensionality reduction
- Color-coding by concept groups or feature activation
- Interactive 3D plots with zoom and rotation
app.py
: Main application and UImodels.py
: Sparse autoencoder implementationvisualization.py
: Visualization functionsdata.py
: Data loading and processingrequirements.txt
: Project dependencies
-
Word Embeddings: The tool starts with pre-trained word embeddings that capture semantic meaning.
-
Sparse Autoencoder:
- Compresses the embeddings into a smaller number of features
- Uses sparsity to encourage interpretable features
- Each feature potentially captures a specific semantic concept
-
Monosemanticity Analysis:
- Identifies features that respond strongly to specific concept groups
- Higher scores indicate more selective features
-
Visualization:
- Uses UMAP to create a 3D visualization
- Preserves both local and global semantic relationships
- Colors indicate either concept groups or feature activation strength
-
Word Selection:
- Choose related words that you want to analyze
- Include enough examples of each concept
- Mix different semantic categories
-
Concept Groups:
- Create meaningful semantic categories
- Include enough words in each group
- Make groups that could have semantic features
-
Parameters:
- Adjust the feature size based on complexity
- Increase epochs for better training
- Tune the monosemanticity threshold to find meaningful features
- Limited to words in the embedding model's vocabulary
- Quality depends on the input embeddings
- May require parameter tuning for best results
- Computationally intensive for large vocabularies
- Scrutinize training loop
- Add more embedding models for comparison
- Improve UMAP inference and interpretability
- Map and visualize features in 3D space across layers