Skip to content

Commit

Permalink
Update workflow readability and data storage paths (#8)
Browse files Browse the repository at this point in the history
* Change data path workflow to match vcstools (#3)

* changed base data directories to be USER based, not in shared /astro/mwavcs/vcs

* replaced hard-coded paths with those from config, replaced rm with munlink command

* removed Galaxy config

* removed duplicate parameter assignment, config overrides anyway

* updated data path in docs

* moved default mwa_search and vcstools versions to config file

* updated default software versions for OzStar too

* fix fitsdir in search flow

* fixed remaining /group reference

* remove explicit assignment of global config params

* added software version defaults for Shanghai server

* Added a link to the ReadTheDocs documentation to the README

* Added first pass workflow diagrams to docs folder

* Added links to docs README

* Attempt to add image to sphinx documentation

* Update README.md

* Update README.md

* Cleanup Nextflow scripts to implement best practices (#7)

* Moved all of the params definitions into the nextflow.config and commented them

* Rearranged config to load things in the right order

* Replaced file inputs with path

* Rewrote the beamforming so it is simplier and easier to understand

* Made the config more readable and fixed a few bugs

* Replaced basedir with vcsdir

* Cleaned up the pulsar search module

* Got the classifier working but I did install LOTAAS_wrapper.py installed in PulsarFeatureLab

* Fixed up the mwa_seach_pipeline and calculated time and memory using channels correctly

* Fixed the ipfb mode

* Updated the --help to be more accurate

* Fixed up the single pulse only search

* Started making some very simple testing documentation (may be replaced with unit tests later)

* Made the ddplan scripts also calculate and approximate work function

* Made the pipeline split the dispersion plan into groups of equal work function size

* GroupTuple the search puts by the number of dms so there are not stopping points

* Made same changes to single pulse

* Updated software layout to prevent instal scriptlation bugs

* Started creating better documentation of our dependancies

* Began updating the data_processing_pipeline.nf INCOMPLETE

* Made some of the config calcs a function and made the presto version a param

* Collates the prepfold jobs so they run more efficiently on HPC

* Made a few more options to give you more options to add pulsars of different colours and shapes (#10)

* Find cand. position bug fix (#9)

* Moved all of the params definitions into the nextflow.config and commented them

* Rearranged config to load things in the right order

* Replaced file inputs with path

* Rewrote the beamforming so it is simplier and easier to understand

* Made the config more readable and fixed a few bugs

* Replaced basedir with vcsdir

* Cleaned up the pulsar search module

* Got the classifier working but I did install LOTAAS_wrapper.py installed in PulsarFeatureLab

* Fixed up the mwa_seach_pipeline and calculated time and memory using channels correctly

* Fixed the ipfb mode

* Updated the --help to be more accurate

* Fixed up the single pulse only search

* Started making some very simple testing documentation (may be replaced with unit tests later)

* Made the ddplan scripts also calculate and approximate work function

* Made the pipeline split the dispersion plan into groups of equal work function size

* GroupTuple the search puts by the number of dms so there are not stopping points

* Made same changes to single pulse

* Updated software layout to prevent instal scriptlation bugs

* Started creating better documentation of our dependancies

* Began updating the data_processing_pipeline.nf INCOMPLETE

* Made some of the config calcs a function and made the presto version a param

* Collates the prepfold jobs so they run more efficiently on HPC

* Fixed a bug in the splice formatting for single beams

* Updated find_candidate_position.nf so that it works with the new format

---------

Co-authored-by: Sam McSweeney <sammy.mcsweeney@gmail.com>
Co-authored-by: Sam McSweeney <robotopia@users.noreply.github.com>
Co-authored-by: Nick Swainston <nickaswainston@gmail.com>
  • Loading branch information
4 people authored Apr 21, 2023
1 parent b3274cf commit 9c86924
Show file tree
Hide file tree
Showing 61 changed files with 1,436 additions and 1,367 deletions.
52 changes: 38 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,28 @@

This repository was written by Nick Swainston to automate pulsar searching using the PRESTO software suite. An explanation of the search procedure can be found on the wiki of the GitHub page. The pipeline uses Nextflow to manage all the required jobs for both beamforming and searching.

## Documentation

Documentation for `mwa_search` is hosted at [this ReadTheDocs link](https://mwa-search-cira.readthedocs.io/en/latest/).
Source code for this documentation is in the [docs][docs] folder.

## Prerequisites

Requires the [PRESTO](https://github.com/scottransom/presto) software suite, [Nextflow](https://www.nextflow.io/) and [_vcstools_](https://github.com/CIRA-Pulsars-and-Transients-Group/vcstools).
To run the pipelines contained in the `nextflow` directory requires [Nextflow](https://www.nextflow.io/).
The dependancies of the pipeline are containerised so you will need container software such as Docker installed. If you would like an explanation of the depenancies see the dependancy section below.

## Installing

On Swinburne's Ozstar supercomputer you can load the module using
The repository's scripts can be installed using either:
```
pip install .
```
or
```
python setup.py install.
```

On Swinburne's Ozstar supercomputer, the pipeline is already installed so you can load the module using
```
module use /fred/oz125/software/modulefiles
module load vcstools/master
Expand All @@ -24,21 +39,30 @@ module load vcstools
module load mwa_search
```

If you want to install this pipeline on your supercomputer I will likely need to assistant the installation and the writing of your config file. If you would like to try yourself, I recommend installing the _vcstools_ beamformer [docker image](https://hub.docker.com/repository/docker/cirapulsarsandtransients/vcstools) and then installing the python scripts with the setup.py If you would like to attempt it yourself do the following. Make a directory called mwa\_search and then move into it. Then clone the repository and move into the directory it creates. Run the build script using
```
./build.sh
```
This will move all the python scripts to a directory called master. Then create a module that does the following.
```
export PATH=${PATH}:<your_install_directory>/master
export PYTHONPATH=${PYTHONPATH}:<your_install_directory>/master
```
where \<your\_install_directory\> is the directory where you ran the git clone command, and \<search\_directory\> is where you would like your search pipeline products (make sure this directory exists).
If you want to install this pipeline on your supercomputer you will need to edit the `nextflow.config` based on your cluster.
To do this, copy one of the `if ( hostname.startsWith("<cluster>") ) {` sections of the config
and edit to describe your clusters' directories structure and dependancy installation.
I will likely need to assistant the installation and the writing of your config file so feel free to make a GitHub issue to ask for assistance.


## Dependancies
The following is all the software we use in the `mwa_search_pipeline.nf`. The following will describe the version of the software we use, the location of Docker images and any changes we have made to the software.

### PRESTO
[PRESTO](https://github.com/scottransom/presto) pulsar search software suite.
We use [this fork](https://github.com/NickSwainston/presto) which includes out custom `ACCEL_sift.py` script.
We use version [v4.0_7ec3c83](https://hub.docker.com/repository/docker/nickswainston/presto/general)


### vcstools
[vcstools](https://github.com/CIRA-Pulsars-and-Transients-Group/vcstools).

You will also need to edit _config.py_ in _vcstools_ to comply with the modules and directory structure of your supercomputer.
I recommend installing the _vcstools_ beamformer [docker image](https://hub.docker.com/repository/docker/cirapulsarsandtransients/vcstools) and then installing the python scripts with the setup.py If you would like to attempt it yourself do the following.

## Developing
If you create a new branch of the git repo then when you use the _build.sh_ script it will make a directory based on your branch name which can be used to test changes to the code without disrupting currently running versions. _mwa\_search\_pipeline.nf_ has an option --mwa_search_version which can use a different module version (which you will have to create) and used to test it, You can then submit a pull request to the GitHub.
If you create a new branch of the git repo then when you use the _build.sh_ script it will make a directory based on your branch name which can be used to test changes to the code without disrupting currently running versions.
_mwa\_search\_pipeline.nf_ has an option --mwa_search_version which can use a different module version (which you will have to create) and used to test it.
You can then submit a pull request to the GitHub.

## Common Use Cases
All Nextflow scripts have a --help option to explain all the available arguments.
Expand Down
11 changes: 11 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Documentation

Full documentation for `mwa_search` is hosted at [this ReadTheDocs link](https://mwa-search-cira.readthedocs.io/en/latest/).

## First pass workflow diagram

Workflow diagram for the first pass survey.
This figure exists as `workflow.png` in the [Overleaf document for the SMART survey description paper](https://www.overleaf.com/5344792699hjhfpkddstxg).
It is available here in both PNG and PPTX formats ([first_pass_workflow.png](first_pass_workflow.png), [first_pass_workflow.pptx](first_pass_workflow.pptx)).

![first_pass_workflow.png](first_pass_workflow.png)
Binary file added docs/first_pass_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/first_pass_workflow.pptx
Binary file not shown.
2 changes: 2 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,7 @@ Welcome to mwa_search's documentation!
mwa_search_scripts
plotting_scripts

test_commands

dpp_modules
mwa_search_modules
10 changes: 7 additions & 3 deletions docs/smart_processing.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
.. _smart_processing:

SMART Pulsar Search Proccessing
SMART Pulsar Search Processing
===============================

The following guide will teach you how to process SMART data for the shallow first pass search.

Overview
--------

.. image:: first_pass_workflow.png

Choosing an observation
-----------------------
Expand Down Expand Up @@ -59,7 +63,7 @@ You also need to transfer the calibration solutions to OZStar. This should be do

cd /fred/oz125/vcs
mkdir -p <obsid>/cal/<calid>/rts
rsync garrawarla:/astro/mwavcs/vcs/<obsid>/cal/<calid>/rts/*{dat,txt} <obsid>/cal/<calid>/rts
rsync garrawarla:/astro/mwavcs/${USER}/<obsid>/cal/<calid>/rts/*{dat,txt} <obsid>/cal/<calid>/rts

This will download all the calibration solutions and flagged tiles/channels files we need. Once both downloads are complete, update the google sheet so that this observation is marked as "processing" and continue to the next step.

Expand All @@ -77,4 +81,4 @@ At the same time, you should make another screen and run the following command::
cd /fred/oz125/pulsar_search
rsync_rm_loop.sh <obsid> <calid> <obsname> <start time> <end time>

This used to transfer the candidates to Prometheus, but since we ran out of room, it deletes all the temporary files after each mwa_search_pipeline.nf batch is done to assure we don't go over our storage limit.
This used to transfer the candidates to Prometheus, but since we ran out of room, it deletes all the temporary files after each mwa_search_pipeline.nf batch is done to assure we don't go over our storage limit.
35 changes: 35 additions & 0 deletions docs/test_commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
.. _smart_processing:

Test Commands
=============

A bunch of command to run for testing using the first 600 seconds of 1301674968


beamform.nf
-----------

normal::

beamform.nf --obsid 1301674968 --calid 1301739904 --begin 1301674969 --end 1301675568 --pointings 13:11:52.64_-12:28:01.63,14:18:50.28_-39:21:18.51 -w test_work --out_dir test_cands --vcstools_version devel --publish_fits

summed::

beamform.nf --obsid 1301674968 --calid 1301739904 --begin 1301674969 --end 1301675568 --pointings 13:11:52.64_-12:28:01.63,14:18:50.28_-39:21:18.51 -w test_work --out_dir test_cands --vcstools_version devel --publish_fits --summed

ipfb::

beamform.nf --obsid 1301674968 --calid 1301739904 --begin 1301674969 --end 1301675568 --pointings 13:11:52.64_-12:28:01.63,14:18:50.28_-39:21:18.51 -w test_work --out_dir test_cands --vcstools_version devel --publish_fits --ipfb


pulsar_search.nf
----------------
Run with outputs of beamform.nf

simple periodic::

pulsar_search.nf --obsid 1301674968 --calid 1301739904 --fits_file /fred/oz125/vcs/1301674968/pointings/13:11:52.64_-12:28:01.63/*fits -w test_work --out_dir test_cands --vcstools_version devel --dm_min 36 --dm_max 37

Just single pulse search::

pulsar_search.nf --obsid 1301674968 --calid 1301739904 --fits_file /fred/oz125/vcs/1301674968/pointings/13:11:52.64_-12:28:01.63/*fits -w test_work --out_dir test_cands --vcstools_version devel --dm_min 36 --dm_max 37 --sp
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,17 @@ def calc_nsub(centrefreq, dm):
return int(nsub)


def dd_plan(centrefreq, bandwidth, nfreqchan, timeres, lowDM, highDM,
min_DM_step=0.02, max_DM_step=500.0, max_dms_per_job=5000):
def dd_plan(
centrefreq,
bandwidth,
nfreqchan,
timeres,
lowDM,
highDM,
min_DM_step=0.02,
max_DM_step=500.0,
max_dms_per_job=5000,
):
"""
Work out the dedisperion plan
Expand Down Expand Up @@ -146,12 +155,13 @@ def dd_plan(centrefreq, bandwidth, nfreqchan, timeres, lowDM, highDM,
#range from last to new
D_DM = round(D_DM, 2)
nDM_step = int((D_DM - previous_DM) / DM_step)
total_work_factor = nDM_step / downsample
if D_DM > lowDM:
nsub = calc_nsub(centrefreq, D_DM)
if downsample > 16:
DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, 16, nsub ])
DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, 16, nsub, total_work_factor ])
else:
DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, downsample, nsub ])
DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, downsample, nsub, total_work_factor ])

previous_DM = D_DM

Expand All @@ -163,14 +173,29 @@ def dd_plan(centrefreq, bandwidth, nfreqchan, timeres, lowDM, highDM,
new_DD_plan_array = []
for dd_line in DD_plan_array:
new_dd_lines = []
while dd_line[3] > max_dms_per_job:
dm_min, dm_max, dm_step, ndm, timeres, downsamp, nsub, total_work_factor = dd_line
while ndm > max_dms_per_job:
# previous_DM, D_DM, DM_step, nDM_step, timeres, downsample, nsub
new_dd_lines.append([dd_line[0], dd_line[0] + dd_line[2] * max_dms_per_job,
dd_line[2], max_dms_per_job, dd_line[4],
dd_line[5], dd_line[6]])
dd_line = [dd_line[0] + dd_line[2] * max_dms_per_job, dd_line[1],
dd_line[2], dd_line[3] - max_dms_per_job, dd_line[4],
dd_line[5], dd_line[6]]
new_dd_lines.append([
dm_min,
dm_min + dm_step * max_dms_per_job,
dm_step,
max_dms_per_job,
timeres,
downsamp,
nsub,
total_work_factor
])
dd_line = [
dm_min + dm_step * max_dms_per_job,
dm_max,
dm_step,
ndm - max_dms_per_job,
timeres,
downsamp,
nsub,
total_work_factor
]
new_dd_lines.append(dd_line)
for n_line in new_dd_lines:
new_DD_plan_array.append(n_line)
Expand Down
File renamed without changes.
File renamed without changes.
83 changes: 28 additions & 55 deletions nextflow/beamform.nf
Original file line number Diff line number Diff line change
@@ -1,61 +1,38 @@
#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

params.obsid = null
params.calid = null
params.pointings = null
params.pointing_file = null

params.begin = null
params.end = null
params.all = false

params.summed = false
params.channels = null
params.ipfb = false
params.vcstools_version = 'master'
params.mwa_search_version = 'master'

params.didir = "${params.scratch_basedir}/${params.obsid}/cal/${params.calid}/rts"
params.publish_fits = false
params.publish_fits_scratch = true

params.no_combined_check = false
nextflow.enable.dsl=2

params.help = false
if ( params.help ) {
help = """beamform.nf: A pipeline that will beamform and splice on all input pointings.
|Required argurments:
| --obsid Observation ID you want to process [no default]
| --calid Observation ID of calibrator you want to process [no default]
| --pointings A comma sepertated list of pointings with the RA and Dec seperated
| --begin First GPS time to process [no default]
| --end Last GPS time to process [no default]
| --all Use entire observation span. Use instead of -b & -e. [default: ${params.all}]
| --publish_fits
| Publish to the fits files to the vcs subdirectory.
|
|Pointing arguments (one is required):
| --pointings A comma-separated list of pointings with the RA and Dec separated
| by _ in the format HH:MM:SS_+DD:MM:SS, e.g.
| "19:23:48.53_-20:31:52.95,19:23:40.00_-20:31:50.00" [default: None]
| --pointing_file
| A file containing pointings with the RA and Dec seperated by _
| A file containing pointings with the RA and Dec separated by _
| in the format HH:MM:SS_+DD:MM:SS on each line, e.g.
| "19:23:48.53_-20:31:52.95\\n19:23:40.00_-20:31:50.00" [default: None]
| --begin First GPS time to process [no default]
| --end Last GPS time to process [no default]
| --all Use entire observation span. Use instead of -b & -e. [default: false]
| --publish_fits
| Publish to the fits directory (/group on Galaxy). Include this
| option.
|
|Beamforming types arguments (optional):
| --summed Sum the Stoke paramters [default: ${params.summed}]
| --incoh Also produce an incoherent beam [default: ${params.incoh}]
| --ipfb Also produce a high time resolution Inverse Polyphase Filter Bank beam
| [default: ${params.ipfb}]
| --offringa Use offringa calibration solution instead of RTS [default: ${params.offringa}]
|
|Optional arguments:
| --summed Add this flag if you the beamformer output as summed polarisations
| (only Stokes I). This reduces the data size by a factor of 4.
| [default: False]
| --ipfb Perform an the inverse PFB to produce high time resolution beamformed
| vdif files [default: false]
| --publish_fits_scratch
| Publish to the scratch fits directory (/astro on Galaxy). Use this
| instead of --publish_fits_scratch
| --vcstools_version
| The vcstools module version to use [default: master]
| --mwa_search_version
| The mwa_search module bersion to use [default: master]
| The vcstools module version to use [default: ${params.vcstools_version}]
| --no_combined_check
| Don't check if all the combined files are available [default: false]
| -w The Nextflow work directory. Delete the directory once the processs
Expand All @@ -68,16 +45,10 @@ if ( params.pointing_file ) {
pointings = Channel
.fromPath(params.pointing_file)
.splitCsv()
.collect()
.flatten()
.collate( params.max_pointings )
}
else if ( params.pointings ) {
pointings = Channel
.from(params.pointings.split(","))
.collect()
.flatten()
.collate( params.max_pointings )
}
else {
println "No pointings given. Either use --pointing_file or --pointings. Exiting"
Expand All @@ -89,15 +60,17 @@ include { pre_beamform; beamform; beamform_ipfb } from './beamform_module'
workflow {
pre_beamform()
if ( params.ipfb ) {
beamform_ipfb( pre_beamform.out[0],\
pre_beamform.out[1],\
pre_beamform.out[2],\
pointings )
beamform_ipfb(
pre_beamform.out.utc_beg_end_dur,
pre_beamform.out.channels,
pointings
)
}
else {
beamform( pre_beamform.out[0],\
pre_beamform.out[1],\
pre_beamform.out[2],\
pointings )
beamform(
pre_beamform.out.utc_beg_end_dur,
pre_beamform.out.channels,
pointings
)
}
}
Loading

0 comments on commit 9c86924

Please sign in to comment.