Name: | Fully Bayesian Forecast Example |
---|---|
Author: | Thomas Gessey-Jones |
Version: | 1.0.1 |
Homepage: | https://github.com/ThomasGesseyJones/FullyBayesianForecastsExample |
Letter: | https://ui.adsabs.harvard.edu/abs/2024PhRvD.109l3541G/abstract |
Example of a fully Bayesian forecast performed using an Evidence Network. This code also replicates the analysis of Gessey-Jones et al. (2024). This repository thus serves the dual purposes of providing an example code base others can modify to perform their own fully Bayesian forecasts and also providing a reproducible analysis pipeline for the letter.
The overall goal of the code is to produce a fully Bayesian forecast of the chance of a REACH-like experiment making a significant detection of the 21-cm global signal from within foregrounds and noise. It also produces figures showing how this conclusion changes with different astrophysical parameter values and validates the forecast through blind coverage tests and comparison to PolyChord.
The repository is intended to be installed locally by directly cloning it from GitHub. To do this run the following command in the terminal
git clone git@github.com:ThomasGesseyJones/FullyBayesianForecastsExample.git
This will create a local copy of the repository. The pipeline can then be run from the terminal (see below).
The code is split into two main parts. The first part is the modules which provide the general functionality of evidence networks, data simulators, and prior samplers. The second part is the scripts which run the fully Bayesian forecast.
There are three modules included in the repository:
- evidence_networks: This module contains the code for the evidence network class. This class is used to build the evidence network used in the forecasts. The module also provides an implementation of the l-POP exponential loss function. See the class docstring for more details of its capabilities and usage.
- priors: This module contains the code to generate functions that sample standard prior distributions. These include uniform, log-uniform, and Gaussian priors.
- simulators: This module defines simulators. In our code, these are functions that take a number of data simulations to run and return that number of mock data simulations alongside the values of any parameters that were used in the simulations. Submodules of this module define functions to generate specific simulators for noise, foregrounds, and the 21-cm signal.
These three modules are used in the three analysis scripts:
- verification_with_polychord.py: This script generates a range of mock data sets from both the no-signal model and the with-signal model, and then performs a Bayesian analysis on each of them. Evaluating the Bayes ratio between the two models of the data using Polychord. These results are then stored in the verification_data directory for later comparison with the results from the evidence network to verify its accuracy. It should be run first, ideally with a large number of versions in parallel as it is very computationally expensive but splits simply into one task per data set.
- train_evidence_network.py: This script builds the evidence network object and the data simulator functions, then trains the evidence network. Once trained it stores the evidence network in the models directory, then runs a blind coverage test on the network and validates its performance against the Polychord Bayes ratio evaluations from the previous script. It should be run second.
- visualize_forecasts.py: This script loads the evidence network from the models directory and uses it to forecast the chance of a REACH-like experiment detecting the 21-cm global signal by applying it to many data sets generated from the noisy-signal model. It then plots this result for fixed astrophysical parameters as in Figure 1 of the letter. This is done for detection significance thresholds of 2, 3 and 5 sigma. Selected numerical values are also output to a .txt file. It should be run last.
All three scripts have docstrings describing their role in more detail, as well as giving advice on how to run them most efficiently. The scripts can be run from the terminal using the following commands:
python verification_with_polychord.py 0
python train_evidence_network.py
python visualize_forecasts.py
to run with the default noise level of 15 mK and replicate the analysis from Gessey-Jones et al. (2024). Alternatively you can pass the scripts a command line argument to specify the experiments noise level in K. For example to run with a noise level of 100 mK you would run the following commands:
python verification_with_polychord.py 0 0.1
python train_evidence_network.py 0.1
python visualize_forecasts.py 0.1
Two other files of interest are:
- fbf_utilities.py: which defines IO functions needed by the three scripts, utility functions to assemble the data simulators for the noise-only and noisy-signal model, and standard whitening transforms.
- configuration.yaml: which defines several parameters used in the code including the experimental frequency resolution, the priors on the astrophysical and foreground parameters, and the astrophysical parameters which are plotted in the forecast figures. If you change the priors or resolution the entire pipeline needs to be rerun to get accurate results.
The various figures produced in the analysis are stored in the figures_and_results directory alongside the timing_data to assess the performance of the methodology and some summary statistics of the evidence networks performance. The figures and data generated in the analysis for Gessey-Jones et al. (2024) are provided in this repository for reference, alongside the figures generated for an earlier version of the letter which did not model foregrounds.
The software is free to use on the MIT open source license. If you use the software for academic purposes then we request that you cite the letter
Gessey-Jones, T. and W. J. Handley. “Fully Bayesian forecasts with evidence networks.” (June 2024). Physical Review D, Volume 109, Issue 12, 123541
If you are using Bibtex you can use the following to cite the letter
@ARTICLE{2024PhRvD.109l3541G,
author = {{Gessey-Jones}, T. and {Handley}, W.~J.},
title = "{Fully Bayesian forecasts with evidence networks}",
journal = {\prd},
year = 2024,
month = jun,
volume = {109},
number = {12},
eid = {123541},
pages = {123541},
doi = {10.1103/PhysRevD.109.123541},
adsurl = {https://ui.adsabs.harvard.edu/abs/2024PhRvD.109l3541G},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}}
Note some of the packages used (see below) in this code have their own licenses that require citation when used for academic purposes (e.g. globalemu and pypolychord). Please check the licenses of these packages for more details.
To run the code you will need to following additional packages:
- globalemu
- tensorflow
- numpy
- keras
- matplotlib
- nvidia-cudnn-cu11
- pandas
- PyYAML
- pypolychord
- scipy
- mpi4py
- scikit-learn
- anesthetic
The code was developed using python 3.8. It has not been tested on other versions of python. Exact versions of the packages used in our analysis can be found in the requirements.txt file for reproducibility.
Additional packages that were used for linting, versioning, and pre-commit hooks are also listed in the requirements.txt file.
If you have any issues or questions about the code please raise an issue on the github page.
Alternatively you can contact the author directly at tg400@cam.ac.uk.