This repository provides the experiments of the paper Unsupervised Domain Adaptation for Constraining Star Formation Histories.
This study has been conducted with the help of the ADAPT library.
The prevalent paradigm of machine learning today is to use past observations to predict future ones. What if, however, we are interested in knowing the past given the present? This situation is indeed one that astronomers must contend with often. To understand the formation of our universe, we must derive the time evolution of the visible mass content of galaxies based on the Spectral Energy Distribution (SED) observed on earth.
As galaxies evolve over billions of years, it is impossible to completely capture any single galaxy’s mass evolution as a function of time. Astrophysicists then generate cosmological simulations, which allow building artificial star formation histories with the corresponding radiation. However, a significant challenge arises due to the domain shift between simulated and real galaxies (see Figure 1).
Figure 1: Radiation spectrums with corresponding star formation histories can be recorded from cosmological simulation to form a dataset (X, y). This dataset is used to learn a machine learning model. Because of the domain shift between real radiation spectrums and simulated ones, domain adaptation is used to adapt the learned machine learning model to the real observations.
The experiments can be run with the notebooks/run_experiments.ipynb
notebook. Figure 2 can be reproduced with the notebooks/kliep_weights_visualization.ipynb
notebook.
Our training and test data sets consist of SEDs from three state of the art cosmological galaxy formation simulations SIMBA [1], EAGLE [2] and IllustrisTNG [3] (The three datasets can be found in the datasets
folder). Input consists of flux densities in 20 filters, and output consists of SFH (time series) in 29 bins from t=0 to t=13.8 billion years ago. We train on any two simulations and test on the third one. This is a necessary first step in developing a technique that can ultimately be applied to observational data.
- We apply KLIEP [4], an instance based method that reweights the sources in order to minimize the KL divergence between any two domains.
- We normalize each SFH time series vector by its sum, and apply kernelPCA to reduce the dimensionality of the normalized SFH time series vectors from 29 to 3.
- We fit two NNs on the source data using the KLIEP derived importance weights: ne network to predict the 3 kPCA components, the other to predict SFH_sum.
Figure 2: KLIEP brings different domains closer by reweighing source samples. Top: Scatter plot of the first two kPCA components. Bottom: KDE plots of log of the first two features.
The results can be computed with the notebooks/results.ipynb
notebook.
Figure 3: Global SFH predictions for the IllustrisTNG experiments. The curves correspond to the sums over all predicted SFH for different domain adaptation methods
Figure 3: Global SFH predictions for the EAGLE experiments. The curves correspond to the sums over all predicted SFH for different domain adaptation methods
Figure 3: Global SFH predictions for the SIMBA experiments. The curves correspond to the sums over all predicted SFH for different domain adaptation methods
- [1] Davé R. et al. 2019. SIMBA: Cosmological simulations with black hole growth and feedback. Monthly Notices of the Royal Astronomical Society. 486 2 2827 2849. [paper]
- [2] Schaye J. et al. 2015. The EAGLE project simulating the evolution and assembly of galaxies and their environments. Monthly Notices of the Royal Astronomical Society. 446 1 521 554. [paper]
- [3] Vogelsberger M. et al. 2014. Introducing the Illustris Project simulating the coevolution of dark and visible matter in the Universe. Monthly Notices of the Royal Astronomical Society. 444 2 1518 1547 [paper]
- [4] Sugiyama, M. et al. 2007. Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation. In Proceedings of the 20 th International Conference on Neural Information Processing Systems, NIPS’ 07. 1433 1440 Red Hook, NY, USA Curran Associates Inc. [paper]