Update workflow readability and data storage paths (#8)

* Change data path workflow to match vcstools (#3) * changed base data directories to be USER based, not in shared /astro/mwavcs/vcs * replaced hard-coded paths with those from config, replaced rm with munlink command * removed Galaxy config * removed duplicate parameter assignment, config overrides anyway * updated data path in docs * moved default mwa_search and vcstools versions to config file * updated default software versions for OzStar too * fix fitsdir in search flow * fixed remaining /group reference * remove explicit assignment of global config params * added software version defaults for Shanghai server * Added a link to the ReadTheDocs documentation to the README * Added first pass workflow diagrams to docs folder * Added links to docs README * Attempt to add image to sphinx documentation * Update README.md * Update README.md * Cleanup Nextflow scripts to implement best practices (#7) * Moved all of the params definitions into the nextflow.config and commented them * Rearranged config to load things in the right order * Replaced file inputs with path * Rewrote the beamforming so it is simplier and easier to understand * Made the config more readable and fixed a few bugs * Replaced basedir with vcsdir * Cleaned up the pulsar search module * Got the classifier working but I did install LOTAAS_wrapper.py installed in PulsarFeatureLab * Fixed up the mwa_seach_pipeline and calculated time and memory using channels correctly * Fixed the ipfb mode * Updated the --help to be more accurate * Fixed up the single pulse only search * Started making some very simple testing documentation (may be replaced with unit tests later) * Made the ddplan scripts also calculate and approximate work function * Made the pipeline split the dispersion plan into groups of equal work function size * GroupTuple the search puts by the number of dms so there are not stopping points * Made same changes to single pulse * Updated software layout to prevent instal scriptlation bugs * Started creating better documentation of our dependancies * Began updating the data_processing_pipeline.nf INCOMPLETE * Made some of the config calcs a function and made the presto version a param * Collates the prepfold jobs so they run more efficiently on HPC * Made a few more options to give you more options to add pulsars of different colours and shapes (#10) * Find cand. position bug fix (#9) * Moved all of the params definitions into the nextflow.config and commented them * Rearranged config to load things in the right order * Replaced file inputs with path * Rewrote the beamforming so it is simplier and easier to understand * Made the config more readable and fixed a few bugs * Replaced basedir with vcsdir * Cleaned up the pulsar search module * Got the classifier working but I did install LOTAAS_wrapper.py installed in PulsarFeatureLab * Fixed up the mwa_seach_pipeline and calculated time and memory using channels correctly * Fixed the ipfb mode * Updated the --help to be more accurate * Fixed up the single pulse only search * Started making some very simple testing documentation (may be replaced with unit tests later) * Made the ddplan scripts also calculate and approximate work function * Made the pipeline split the dispersion plan into groups of equal work function size * GroupTuple the search puts by the number of dms so there are not stopping points * Made same changes to single pulse * Updated software layout to prevent instal scriptlation bugs * Started creating better documentation of our dependancies * Began updating the data_processing_pipeline.nf INCOMPLETE * Made some of the config calcs a function and made the presto version a param * Collates the prepfold jobs so they run more efficiently on HPC * Fixed a bug in the splice formatting for single beams * Updated find_candidate_position.nf so that it works with the new format --------- Co-authored-by: Sam McSweeney <sammy.mcsweeney@gmail.com> Co-authored-by: Sam McSweeney <robotopia@users.noreply.github.com> Co-authored-by: Nick Swainston <nickaswainston@gmail.com>
CIRA-Pulsars-and-Transients-Group · Apr 21, 2023 · 9c86924 · 9c86924
1 parent b3274cf
commit 9c86924
Show file tree

Hide file tree

Showing 61 changed files with 1,436 additions and 1,367 deletions.
diff --git a/README.md b/README.md
@@ -4,13 +4,28 @@
 
 This repository was written by Nick Swainston to automate pulsar searching using the PRESTO software suite. An explanation of the search procedure can be found on the wiki of the GitHub page. The pipeline uses Nextflow to manage all the required jobs for both beamforming and searching.
 
+## Documentation
+
+Documentation for `mwa_search` is hosted at [this ReadTheDocs link](https://mwa-search-cira.readthedocs.io/en/latest/).
+Source code for this documentation is in the [docs][docs] folder.
+
 ## Prerequisites
 
-Requires the [PRESTO](https://github.com/scottransom/presto) software suite, [Nextflow](https://www.nextflow.io/) and [_vcstools_](https://github.com/CIRA-Pulsars-and-Transients-Group/vcstools).
+To run the pipelines contained in the `nextflow` directory requires [Nextflow](https://www.nextflow.io/).
+The dependancies of the pipeline are containerised so you will need container software such as Docker installed. If you would like an explanation of the depenancies see the dependancy section below.
 
 ## Installing
 
-On Swinburne's Ozstar supercomputer you can load the module using
+The repository's scripts can be installed using either:
+```
+pip install .
+```
+or
+```
+python setup.py install.
+```
+
+On Swinburne's Ozstar supercomputer, the pipeline is already installed so you can load the module using
 ```
 module use /fred/oz125/software/modulefiles
 module load vcstools/master
@@ -24,21 +39,30 @@ module load vcstools
 module load mwa_search
 ```
 
-If you want to install this pipeline on your supercomputer I will likely need to assistant the installation and the writing of your config file. If you would like to try yourself, I recommend installing the _vcstools_ beamformer [docker image](https://hub.docker.com/repository/docker/cirapulsarsandtransients/vcstools) and then installing the python scripts with the setup.py If you would like to attempt it yourself do the following. Make a directory called mwa\_search and then move into it. Then clone the repository and move into the directory it creates. Run the build script using 
-```
-./build.sh
-```
-This will move all the python scripts to a directory called master. Then create a module that does the following.
-```
-export PATH=${PATH}:<your_install_directory>/master
-export PYTHONPATH=${PYTHONPATH}:<your_install_directory>/master
-```
-where \<your\_install_directory\> is the directory where you ran the git clone command, and \<search\_directory\> is where you would like your search pipeline products (make sure this directory exists).
+If you want to install this pipeline on your supercomputer you will need to edit the `nextflow.config` based on your cluster.
+To do this, copy one of the `if ( hostname.startsWith("<cluster>") ) {` sections of the config
+and edit to describe your clusters' directories structure and dependancy installation.
+I will likely need to assistant the installation and the writing of your config file so feel free to make a GitHub issue to ask for assistance.
+
+
+## Dependancies
+The following is all the software we use in the `mwa_search_pipeline.nf`. The following will describe the version of the software we use, the location of Docker images and any changes we have made to the software.
+
+### PRESTO
+[PRESTO](https://github.com/scottransom/presto) pulsar search software suite.
+We use [this fork](https://github.com/NickSwainston/presto) which includes out custom `ACCEL_sift.py` script.
+We use version [v4.0_7ec3c83](https://hub.docker.com/repository/docker/nickswainston/presto/general)
+
+
+### vcstools
+[vcstools](https://github.com/CIRA-Pulsars-and-Transients-Group/vcstools).
 
-You will also need to edit _config.py_ in _vcstools_ to comply with the modules and directory structure of your supercomputer.
+I recommend installing the _vcstools_ beamformer [docker image](https://hub.docker.com/repository/docker/cirapulsarsandtransients/vcstools) and then installing the python scripts with the setup.py If you would like to attempt it yourself do the following.
 
 ## Developing
-If you create a new branch of the git repo then when you use the _build.sh_ script it will make a directory based on your branch name which can be used to test changes to the code without disrupting currently running versions. _mwa\_search\_pipeline.nf_ has an option --mwa_search_version which can use a different module version (which you will have to create) and used to test it, You can then submit a pull request to the GitHub.
+If you create a new branch of the git repo then when you use the _build.sh_ script it will make a directory based on your branch name which can be used to test changes to the code without disrupting currently running versions.
+_mwa\_search\_pipeline.nf_ has an option --mwa_search_version which can use a different module version (which you will have to create) and used to test it.
+You can then submit a pull request to the GitHub.
 
 ## Common Use Cases
 All Nextflow scripts have a --help option to explain all the available arguments.

diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,11 @@
+# Documentation
+
+Full documentation for `mwa_search` is hosted at [this ReadTheDocs link](https://mwa-search-cira.readthedocs.io/en/latest/).
+
+## First pass workflow diagram
+
+Workflow diagram for the first pass survey.
+This figure exists as `workflow.png` in the [Overleaf document for the SMART survey description paper](https://www.overleaf.com/5344792699hjhfpkddstxg).
+It is available here in both PNG and PPTX formats ([first_pass_workflow.png](first_pass_workflow.png), [first_pass_workflow.pptx](first_pass_workflow.pptx)).
+
+![first_pass_workflow.png](first_pass_workflow.png)
diff --git a/docs/first_pass_workflow.png b/docs/first_pass_workflow.png
diff --git a/docs/first_pass_workflow.pptx b/docs/first_pass_workflow.pptx
diff --git a/docs/index.rst b/docs/index.rst
@@ -17,5 +17,7 @@ Welcome to mwa_search's documentation!
    mwa_search_scripts
    plotting_scripts
 
+   test_commands
+
    dpp_modules
    mwa_search_modules
diff --git a/docs/smart_processing.rst b/docs/smart_processing.rst
@@ -1,10 +1,14 @@
 .. _smart_processing:
 
-SMART Pulsar Search Proccessing
+SMART Pulsar Search Processing
 ===============================
 
 The following guide will teach you how to process SMART data for the shallow first pass search.
 
+Overview
+--------
+
+.. image:: first_pass_workflow.png
 
 Choosing an observation
 -----------------------
@@ -59,7 +63,7 @@ You also need to transfer the calibration solutions to OZStar. This should be do
 
     cd /fred/oz125/vcs
     mkdir -p <obsid>/cal/<calid>/rts
-    rsync garrawarla:/astro/mwavcs/vcs/<obsid>/cal/<calid>/rts/*{dat,txt} <obsid>/cal/<calid>/rts
+    rsync garrawarla:/astro/mwavcs/${USER}/<obsid>/cal/<calid>/rts/*{dat,txt} <obsid>/cal/<calid>/rts
 
 This will download all the calibration solutions and flagged tiles/channels files we need. Once both downloads are complete, update the google sheet so that this observation is marked as "processing" and continue to the next step.
 
@@ -77,4 +81,4 @@ At the same time, you should make another screen and run the following command::
     cd /fred/oz125/pulsar_search
     rsync_rm_loop.sh <obsid> <calid> <obsname> <start time> <end time>
 
-This used to transfer the candidates to Prometheus, but since we ran out of room, it deletes all the temporary files after each mwa_search_pipeline.nf batch is done to assure we don't go over our storage limit.
+This used to transfer the candidates to Prometheus, but since we ran out of room, it deletes all the temporary files after each mwa_search_pipeline.nf batch is done to assure we don't go over our storage limit.
diff --git a/docs/test_commands.rst b/docs/test_commands.rst
@@ -0,0 +1,35 @@
+.. _smart_processing:
+
+Test Commands
+=============
+
+A bunch of command to run for testing using the first 600 seconds of 1301674968
+
+
+beamform.nf
+-----------
+
+normal::
+
+    beamform.nf --obsid 1301674968 --calid 1301739904 --begin 1301674969 --end 1301675568 --pointings 13:11:52.64_-12:28:01.63,14:18:50.28_-39:21:18.51 -w test_work --out_dir test_cands --vcstools_version devel --publish_fits
+
+summed::
+
+    beamform.nf --obsid 1301674968 --calid 1301739904 --begin 1301674969 --end 1301675568 --pointings 13:11:52.64_-12:28:01.63,14:18:50.28_-39:21:18.51 -w test_work --out_dir test_cands --vcstools_version devel --publish_fits --summed
+
+ipfb::
+
+    beamform.nf --obsid 1301674968 --calid 1301739904 --begin 1301674969 --end 1301675568 --pointings 13:11:52.64_-12:28:01.63,14:18:50.28_-39:21:18.51 -w test_work --out_dir test_cands --vcstools_version devel --publish_fits --ipfb
+
+
+pulsar_search.nf
+----------------
+Run with outputs of beamform.nf
+
+simple periodic::
+
+    pulsar_search.nf --obsid 1301674968 --calid 1301739904 --fits_file /fred/oz125/vcs/1301674968/pointings/13:11:52.64_-12:28:01.63/*fits -w test_work --out_dir test_cands  --vcstools_version devel --dm_min 36 --dm_max 37
+
+Just single pulse search::
+
+    pulsar_search.nf --obsid 1301674968 --calid 1301739904 --fits_file /fred/oz125/vcs/1301674968/pointings/13:11:52.64_-12:28:01.63/*fits -w test_work --out_dir test_cands  --vcstools_version devel --dm_min 36 --dm_max 37 --sp
diff --git a/lib/dpp/__init__.py → dpp/__init__.py b/lib/dpp/__init__.py → dpp/__init__.py
diff --git a/lib/dpp/helper_RM.py → dpp/helper_RM.py b/lib/dpp/helper_RM.py → dpp/helper_RM.py
diff --git a/lib/dpp/helper_RVMfit.py → dpp/helper_RVMfit.py b/lib/dpp/helper_RVMfit.py → dpp/helper_RVMfit.py
diff --git a/lib/dpp/helper_archive.py → dpp/helper_archive.py b/lib/dpp/helper_archive.py → dpp/helper_archive.py
diff --git a/lib/dpp/helper_bestprof.py → dpp/helper_bestprof.py b/lib/dpp/helper_bestprof.py → dpp/helper_bestprof.py
diff --git a/lib/dpp/helper_checks.py → dpp/helper_checks.py b/lib/dpp/helper_checks.py → dpp/helper_checks.py
diff --git a/lib/dpp/helper_classify.py → dpp/helper_classify.py b/lib/dpp/helper_classify.py → dpp/helper_classify.py
diff --git a/lib/dpp/helper_config.py → dpp/helper_config.py b/lib/dpp/helper_config.py → dpp/helper_config.py
diff --git a/lib/dpp/helper_database.py → dpp/helper_database.py b/lib/dpp/helper_database.py → dpp/helper_database.py
diff --git a/lib/dpp/helper_files.py → dpp/helper_files.py b/lib/dpp/helper_files.py → dpp/helper_files.py
diff --git a/lib/dpp/helper_logging.py → dpp/helper_logging.py b/lib/dpp/helper_logging.py → dpp/helper_logging.py
diff --git a/lib/dpp/helper_obs_info.py → dpp/helper_obs_info.py b/lib/dpp/helper_obs_info.py → dpp/helper_obs_info.py
diff --git a/lib/dpp/helper_prepfold.py → dpp/helper_prepfold.py b/lib/dpp/helper_prepfold.py → dpp/helper_prepfold.py
diff --git a/lib/dpp/helper_relaunch.py → dpp/helper_relaunch.py b/lib/dpp/helper_relaunch.py → dpp/helper_relaunch.py
diff --git a/lib/dpp/helper_source_info.py → dpp/helper_source_info.py b/lib/dpp/helper_source_info.py → dpp/helper_source_info.py
diff --git a/lib/dpp/helper_status.py → dpp/helper_status.py b/lib/dpp/helper_status.py → dpp/helper_status.py
diff --git a/lib/dpp/helper_terminate.py → dpp/helper_terminate.py b/lib/dpp/helper_terminate.py → dpp/helper_terminate.py
diff --git a/lib/dpp/plotting_toolkit.py → dpp/plotting_toolkit.py b/lib/dpp/plotting_toolkit.py → dpp/plotting_toolkit.py
diff --git a/lib/mwa_search/__init__.py → mwa_search/__init__.py b/lib/mwa_search/__init__.py → mwa_search/__init__.py
diff --git a/lib/mwa_search/data/SMART_obs_data.npy → mwa_search/data/SMART_obs_data.npy b/lib/mwa_search/data/SMART_obs_data.npy → mwa_search/data/SMART_obs_data.npy
diff --git a/lib/mwa_search/data_load.py → mwa_search/data_load.py b/lib/mwa_search/data_load.py → mwa_search/data_load.py
diff --git a/lib/mwa_search/dispersion_tools.py → mwa_search/dispersion_tools.py b/lib/mwa_search/dispersion_tools.py → mwa_search/dispersion_tools.py
@@ -70,8 +70,17 @@ def calc_nsub(centrefreq, dm):
     return int(nsub)
 
 
-def dd_plan(centrefreq, bandwidth, nfreqchan, timeres, lowDM, highDM,
-            min_DM_step=0.02, max_DM_step=500.0, max_dms_per_job=5000):
+def dd_plan(
+        centrefreq,
+        bandwidth,
+        nfreqchan,
+        timeres,
+        lowDM,
+        highDM,
+        min_DM_step=0.02,
+        max_DM_step=500.0,
+        max_dms_per_job=5000,
+    ):
     """
     Work out the dedisperion plan
 
@@ -146,12 +155,13 @@ def dd_plan(centrefreq, bandwidth, nfreqchan, timeres, lowDM, highDM,
         #range from last to new
         D_DM = round(D_DM, 2)
         nDM_step = int((D_DM - previous_DM) / DM_step)
+        total_work_factor = nDM_step / downsample
         if D_DM > lowDM:
             nsub = calc_nsub(centrefreq, D_DM)
             if downsample > 16:
-                DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, 16, nsub ])
+                DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, 16, nsub, total_work_factor ])
             else:
-                DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, downsample, nsub ])
+                DD_plan_array.append([ previous_DM, D_DM, DM_step, nDM_step, timeres, downsample, nsub, total_work_factor ])
 
             previous_DM = D_DM
 
@@ -163,14 +173,29 @@ def dd_plan(centrefreq, bandwidth, nfreqchan, timeres, lowDM, highDM,
     new_DD_plan_array = []
     for dd_line in DD_plan_array:
         new_dd_lines = []
-        while dd_line[3] > max_dms_per_job:
+        dm_min, dm_max, dm_step, ndm, timeres, downsamp, nsub, total_work_factor = dd_line
+        while ndm > max_dms_per_job:
             # previous_DM, D_DM, DM_step, nDM_step, timeres, downsample, nsub
-            new_dd_lines.append([dd_line[0], dd_line[0] + dd_line[2] * max_dms_per_job,
-                                dd_line[2], max_dms_per_job, dd_line[4],
-                                dd_line[5], dd_line[6]])
-            dd_line = [dd_line[0] + dd_line[2] * max_dms_per_job, dd_line[1],
-                       dd_line[2], dd_line[3] - max_dms_per_job, dd_line[4],
-                       dd_line[5], dd_line[6]]
+            new_dd_lines.append([
+                dm_min,
+                dm_min + dm_step * max_dms_per_job,
+                dm_step,
+                max_dms_per_job,
+                timeres,
+                downsamp,
+                nsub,
+                total_work_factor
+            ])
+            dd_line = [
+                dm_min + dm_step * max_dms_per_job,
+                dm_max,
+                dm_step,
+                ndm - max_dms_per_job,
+                timeres,
+                downsamp,
+                nsub,
+                total_work_factor
+            ]
         new_dd_lines.append(dd_line)
         for n_line in new_dd_lines:
             new_DD_plan_array.append(n_line)

diff --git a/lib/mwa_search/grid_tools.py → mwa_search/grid_tools.py b/lib/mwa_search/grid_tools.py → mwa_search/grid_tools.py
diff --git a/lib/mwa_search/obs_tools.py → mwa_search/obs_tools.py b/lib/mwa_search/obs_tools.py → mwa_search/obs_tools.py
diff --git a/nextflow/beamform.nf b/nextflow/beamform.nf
@@ -1,61 +1,38 @@
 #!/usr/bin/env nextflow
 
-nextflow.enable.dsl = 2
-
-params.obsid = null
-params.calid = null
-params.pointings = null
-params.pointing_file = null
-
-params.begin = null
-params.end = null
-params.all = false
-
-params.summed = false
-params.channels = null
-params.ipfb = false
-params.vcstools_version = 'master'
-params.mwa_search_version = 'master'
-
-params.didir = "${params.scratch_basedir}/${params.obsid}/cal/${params.calid}/rts"
-params.publish_fits = false
-params.publish_fits_scratch = true
-
-params.no_combined_check = false
+nextflow.enable.dsl=2
 
 params.help = false
 if ( params.help ) {
     help = """beamform.nf: A pipeline that will beamform and splice on all input pointings.
              |Required argurments:
              |  --obsid     Observation ID you want to process [no default]
              |  --calid     Observation ID of calibrator you want to process [no default]
-             |  --pointings A comma sepertated list of pointings with the RA and Dec seperated
+             |  --begin     First GPS time to process [no default]
+             |  --end       Last GPS time to process [no default]
+             |  --all       Use entire observation span. Use instead of -b & -e. [default: ${params.all}]
+             |  --publish_fits
+             |              Publish to the fits files to the vcs subdirectory.
+             |
+             |Pointing arguments (one is required):
+             |  --pointings A comma-separated list of pointings with the RA and Dec separated
              |              by _ in the format HH:MM:SS_+DD:MM:SS, e.g.
              |              "19:23:48.53_-20:31:52.95,19:23:40.00_-20:31:50.00" [default: None]
              |  --pointing_file
-             |              A file containing pointings with the RA and Dec seperated by _
+             |              A file containing pointings with the RA and Dec separated by _
              |              in the format HH:MM:SS_+DD:MM:SS on each line, e.g.
              |              "19:23:48.53_-20:31:52.95\\n19:23:40.00_-20:31:50.00" [default: None]
-             |  --begin     First GPS time to process [no default]
-             |  --end       Last GPS time to process [no default]
-             |  --all       Use entire observation span. Use instead of -b & -e. [default: false]
-             |  --publish_fits
-             |              Publish to the fits directory (/group on Galaxy). Include this
-             |              option.
+             |
+             |Beamforming types arguments (optional):
+             |  --summed   Sum the Stoke paramters [default: ${params.summed}]
+             |  --incoh    Also produce an incoherent beam [default: ${params.incoh}]
+             |  --ipfb     Also produce a high time resolution Inverse Polyphase Filter Bank beam
+             |             [default: ${params.ipfb}]
+             |  --offringa Use offringa calibration solution instead of RTS [default: ${params.offringa}]
              |
              |Optional arguments:
-             |  --summed    Add this flag if you the beamformer output as summed polarisations
-             |              (only Stokes I). This reduces the data size by a factor of 4.
-             |              [default: False]
-             |  --ipfb      Perform an the inverse PFB to produce high time resolution beamformed
-             |              vdif files [default: false]
-             |  --publish_fits_scratch
-             |              Publish to the scratch fits directory (/astro on Galaxy). Use this
-             |              instead of --publish_fits_scratch
              |  --vcstools_version
-             |              The vcstools module version to use [default: master]
-             |  --mwa_search_version
-             |              The mwa_search module bersion to use [default: master]
+             |              The vcstools module version to use [default: ${params.vcstools_version}]
              |  --no_combined_check
              |              Don't check if all the combined files are available [default: false]
              |  -w          The Nextflow work directory. Delete the directory once the processs
@@ -68,16 +45,10 @@ if ( params.pointing_file ) {
     pointings = Channel
         .fromPath(params.pointing_file)
         .splitCsv()
-        .collect()
-        .flatten()
-        .collate( params.max_pointings )
 }
 else if ( params.pointings ) {
     pointings = Channel
         .from(params.pointings.split(","))
-        .collect()
-        .flatten()
-        .collate( params.max_pointings )
 }
 else {
     println "No pointings given. Either use --pointing_file or --pointings. Exiting"
@@ -89,15 +60,17 @@ include { pre_beamform; beamform; beamform_ipfb } from './beamform_module'
 workflow {
     pre_beamform()
     if ( params.ipfb ) {
-        beamform_ipfb( pre_beamform.out[0],\
-                       pre_beamform.out[1],\
-                       pre_beamform.out[2],\
-                       pointings )
+        beamform_ipfb(
+            pre_beamform.out.utc_beg_end_dur,
+            pre_beamform.out.channels,
+            pointings
+        )
     }
     else {
-        beamform( pre_beamform.out[0],\
-                  pre_beamform.out[1],\
-                  pre_beamform.out[2],\
-                  pointings )
+        beamform(
+            pre_beamform.out.utc_beg_end_dur,
+            pre_beamform.out.channels,
+            pointings
+        )
     }
 }