Multimodality with supernovae

Overview

This codebase is dedicated to exploring different self-supervised pretraining methods. We integrate multimodal data from supernovae light curves with images of their host galaxies. Our goal is to leverage diverse data types to improve the prediction and understanding of astronomical phenomena.

An overview over the the CLIP method and loss link

All data used in this work is available here: link

Paper associated with code link

Our transformer-based model Maven is pretrained on simulated data and finetuned on observations. We compare it with Maven-lite which is directly trained on observations, and a transformer-based supervised classifcation model and regression model.

Installation

Prerequisites

Before installing, ensure you have the following prerequisites:

Python 3.8 or higher
pip package manager

Steps

Clone the Repository

Clone the repository to your local machine and navigate into the directory:

git clone git@github.com:ThomasHelfer/Multimodal-hackathon-2024.git
cd Multimodal-hackathon-2024.git

Get data

Unpack the dataset containing supernovae spectra, light curves and host galaxy images:

git clone https://huggingface.co/datasets/thelfer/multimodal_supernovae
mv multimodal_supernovae/ZTFBTS* .
mkdir sim_data && cd sim_data 
wget https://huggingface.co/datasets/thelfer/multimodal_supernovae/resolve/main/sim_data/ZTF_Pretrain_5Class.hdf5

Install Required Python Packages

We recommend to set up an virtual enviorment
```
virtualenv dev
source dev/bin/activate
```
Install all dependencies listed in the requirements.txt file:
```
pip install -r requirements.txt 
```

Pretrain on simulated data

Run the pretrain script

python pretraining_clip_wandb.py pretrain_config/maven_pretrain_config.yaml

Finetune maven on real data

Clip finetuning the pretrained model
```
python finetune_clip.py configs/maven_finetune.yaml
```
the config file uses the path of our pre-trained model, to apply this to your model, please change the path

Train maven-lite

Run the script

python script_wandb.py configs/maven-lite.yaml

Setting Up a Hyperparameter Scan with Weights & Biases

Create a Weights & Biases Account
Sign up for an account at Weights & Biases if you haven't already.
Configure Your Project
Edit the configuration file to specify your project name. Ensure the name matches the project you create on wand.ai. You can define sweep parameters within the config file .
Choose important parameters
In the config file you can choose
```
extra_args
  regression: True
```
if true, script_wandb.py performs a regression for redshift. Similarly for
```
extra_args
  classification: True
```
if true, script_wandb.py performs a classification. if neither are true, it will perform a normal clip pretraining. Lastly, for
```
extra_args
  pretrain_lc_path: 'path_to_checkpoint/checkpoint.ckpt'
  freeze_backbone_lc: True
```
preloads a pretrained model in script_wandb.py or allows to restart a run from a checkpoint for retraining_wandb.py
Run the Sweep Script
Start the hyperparameter sweep with the following command:
```
python script_wandb.py configs/config_grid.yaml 
```
Resume a sweep with the following command:
```
python script_wandb.py [sweep_id]
```
API Key Configuration
The first execution will prompt you for your Weights & Biases API key, which can be found here. Alternatively, you can set your API key as an environment variable, especially if running on a compute node:
```
export WANDB_API_KEY=...
```
View Results
Monitor and analyze your experiment results on your Weights & Biases project page. wand.ai

Running a k-fold cross-validation

We can run a k-fold cross validation by defining the variable

 extra_args:
   kfolds: 5 # for strat Crossvaildation

as this can take serially very long, one can choose to split your runs for different submission by just choosing certain folds for each submission

   foldnumber:
     values: [1,2,3]

Calculate performance metrics from models

To calculate the performance of checkpoint files of models, change the folderpath in the file evaluate_models.py and corresponding name. Then simply calculate metrics by running

python evaluate_models.py

Name		Name	Last commit message	Last commit date
Latest commit History 459 Commits
.github/workflows		.github/workflows
configs		configs
data		data
evaluation_metrics		evaluation_metrics
imgs		imgs
models		models
pretrain_config		pretrain_config
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
evaluate_models.py		evaluate_models.py
finetune_clip.py		finetune_clip.py
plot_figs_paper.ipynb		plot_figs_paper.ipynb
pretraining_clip_wandb.py		pretraining_clip_wandb.py
requirements.txt		requirements.txt
script_wandb.py		script_wandb.py
setup.py		setup.py
tsne.ipynb		tsne.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodality with supernovae

Overview

Installation

Prerequisites

Steps

Clone the Repository

Get data

Install Required Python Packages

Pretrain on simulated data

Finetune maven on real data

Train maven-lite

Setting Up a Hyperparameter Scan with Weights & Biases

Create a Weights & Biases Account

Configure Your Project

Choose important parameters

Run the Sweep Script

API Key Configuration

View Results

Running a k-fold cross-validation

Calculate performance metrics from models

About

Releases

Packages

Contributors 4

Languages

License

ThomasHelfer/multimodal-supernovae

Folders and files

Latest commit

History

Repository files navigation

Multimodality with supernovae

Overview

Installation

Prerequisites

Steps

Clone the Repository

Get data

Install Required Python Packages

Pretrain on simulated data

Finetune maven on real data

Train maven-lite

Setting Up a Hyperparameter Scan with Weights & Biases

Create a Weights & Biases Account

Configure Your Project

Choose important parameters

Run the Sweep Script

API Key Configuration

View Results

Running a k-fold cross-validation

Calculate performance metrics from models

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages