Staging (phenoimager2mc), max-projections (#557)

* added staging functionality - still have TODOs * phenoimager2mc works * made illumination togglable * Added nuclear and membrane channel specification and max projection * fixed membrane channels * added Documentation for staging * added documentation * illumination true by default * added validation, excluded extra parameter
labsyspharm · Aug 28, 2024 · f25fcbf · f25fcbf
1 parent 5883dda
commit f25fcbf
Show file tree

Hide file tree

Showing 14 changed files with 408 additions and 56 deletions.
diff --git a/CHANGES.md b/CHANGES.md
@@ -1,3 +1,35 @@
+
+### 2024-08-26
+
+* Add staging step before illumination - `start-at: staging`
+* `phenoimager2mc` added as a staging module - `staging-method: phenoimager2mc`
+* illumination is run by default if `start-at: staging` is given and can be toggled off with `illumination: false`
+* Add max projection option to recyze on multiple nuclear and/or membrane channels provided - output will have nuclear channel 0 and membrane channel 1
+* `segmentation-nuclear-channel` and `segmentation-membrane-channel` should only be used if segmentation-max-projection is toggled, otherwise the output of recyze would only contain the channels provided in `segmentation-channel`
+  * `segmentation-max-projection: true`
+  * `segmentation-channel: 1 3 8 9`
+  * `segmentation-nuclear-channel: 1 8`
+  * `segmentation-membrane-channel: 3 9`
+
+Example parameter file for running staging, registration, segmentation with cellpose using the above options and quantification:
+```
+workflow:
+  start-at: staging
+  stop-at: quantification
+  illumination: false
+  staging-method: phenoimager2mc
+  segmentation-recyze: true
+  segmentation-max-projection: true
+  segmentation-channel: 1 5 7 8 11
+  segmentation-nuclear-channel: 5 11
+  segmentation-membrane-channel: 1 7 8
+  segmentation: cellpose
+options:
+  phenoimager2mc: -m 6 --normalization max
+  ashlar: --align-channel 4 --flip-y
+  cellpose: --pretrained_model cyto --chan 1 --chan2 0 --no_npy
+```
+
 ### 2024-03-10
 
 * Allow for dynamic sample name specification in the fileseries/filepattern expressions.

diff --git a/config/defaults.yml b/config/defaults.yml
@@ -4,12 +4,15 @@ workflow:
   qc-files: inherit
   tma: false
   viz: false
+  illumination: true
   background: false
   background-method: backsub
+  staging-method: phenoimager2mc
   multi-formats: '{.xdce,.nd,.scan,.htd}'
   single-formats: '{.ome.tiff,.ome.tif,.rcpnl,.btf,.nd2,.tif,.czi}'
   segmentation: unmicst
   segmentation-recyze: false
+  segmentation-max-projection: false
   downstream: scimap
 options:
   ashlar: -m 30
@@ -33,6 +36,15 @@ modules:
     version: 2.2.9
     cmd: python /app/UNetCoreograph.py --outputPath .
     input: --imagePath
+  staging:
+    -
+      name: phenoimager2mc
+      container: ghcr.io/schapirolabor/phenoimager2mc
+      version: v0.1.1
+      cmd: |-
+        python /phenoimager2mc/scripts/phenoimager2mc.py \
+          --indir ${indir} \
+          -o ${cycle}.ome.tif
   background:
     -
       name: backsub

diff --git a/config/schema.yml b/config/schema.yml
@@ -7,13 +7,18 @@ workflow:
   - viz
   - multi-formats
   - single-formats
+  - illumination
   - segmentation
   - segmentation-channel
   - segmentation-recyze
+  - segmentation-nuclear-channel
+  - segmentation-membrane-channel
+  - segmentation-max-projection
   - downstream
   - ilastik-model
   - mesmer-model
   - background
+  - staging-method
   - cellpose-model
   - background-method
   - flowsom-model

diff --git a/docs/parameters/core.md b/docs/parameters/core.md
@@ -111,6 +111,7 @@ By default, MCMICRO skips this step as it requires manual inspection of the outp
 ``` yaml
 workflow:
   start-at: illumination
+  illumination: true
 ```
 
 * Running outside of MCMICRO: [Instructions](https://github.com/labsyspharm/basic-illumination#running-as-a-docker-container){:target="_blank"}.

diff --git a/docs/parameters/other.md b/docs/parameters/other.md
@@ -7,6 +7,8 @@ nav_exclude: true
 ---
 
 # Other Modules
+Staging
+1. [phenoimager2mc](./other.html#phenoimager2mc)
 
 Segmentation
 1. [Ilastik](./other.html#ilastik)
@@ -27,6 +29,39 @@ Background subtraction
 
 ---
 
+## Staging - Phenoimager2mc
+{: .fw-500}
+
+### Description
+Introducing an additional `staging` step in the pipeline, the [phenoimager2mc](https://github.com/schapirolabor/phenoimager2mc){:target="_blank"} module takes in individual unmixed component data tiles produced by the [InForm software by Akoya](https://www.akoyabio.com/phenoimager/inform-tissue-finder/) and produces an `ome-tif` file per cycle that is compatible with ASHLAR.
+
+### Usage
+By default, `staging` is not performed and the parameter has to be provided as shown below. In addition, the `staging-method: phenoimager2mc` parameter can specify which staging option should be used. It should be noted, `illumination` is run by default after `staging` and should actively be turned off if not needed as presented. It is highly recommended that the input tiles already have overlaps between them, if not, gaps will be introduced. 
+
+* Example `params.yml`:
+
+``` yaml
+workflow:
+  start-at: staging
+  staging: true
+  illumination: false
+  staging-method: phenoimager2mc
+options:
+  phenoimager2mc: -m 6 --normalization max
+```
+* Specify number of channels per cycle: `-m`
+* Specify normalization (float32 -> uint16) method: `--normalization`, options `max`, `99th` for maximum value normalization or 99th percentile, respectively.
+* Running outside of MCMICRO: [Instructions](https://github.com/labsyspharm/mcmicro-ilastik){:target="_blank"}.
+
+
+### Input
+Inputs should be saved in the `staging` directory as one subdirectory per cycle containing the `tif` or `tiff` component data files. The `marker` file should represent the expected registered image, file paths should not contain whitespaces, and it should be taken into account that cycle order is the alphabetical order of subdirectory names.
+
+### Output
+An normalized `.ome.tif` file compatible with ASHLAR containing information from all tiles per cycle.
+
+[Back to top](./other.html#other-modules){: .btn .btn-purple} 
+
 ## Ilastik
 {: .fw-500}
 
@@ -158,16 +193,31 @@ Cellpose is a DL segmentation algorithm able to segment the nuclear or cytoplasm
 
 To use this segmentation method add the line `segmentation: cellpose` in the workflow section of the `params.yml` file.  Under the options section of `params.yml` specify the input arguments of the cellpose script, such as segmentation model and channel(s) on which the model will be applied.  Notice that the channel(s) argument(s), i.e. --chan and --chan2, expect a zero-based index.  
 
-For large data sets it is recommended to use the parameters `segmentation-recyze: true` along with `segmentation-channel:`.  In the example below we consider an image stack of 10 channels with the nuclear marker in channel 2 and membrane marker in channel 7.  The use of `segmentation-recyze: true` will reduce the image stack to these two channels prior to segmentation, hence reindexing the stack channels such that 2-->0 and 7-->1.
+For large data sets it is recommended to use the parameters `segmentation-recyze: true` along with `segmentation-channel:`.  In the example below we consider an image stack of 12 channels with the nuclear marker in channel 2 and membrane marker in channel 7.  The use of `segmentation-recyze: true` will reduce the image stack to these two channels prior to segmentation, hence reindexing the stack channels such that 2-->0 and 7-->1. 
+If `segmentation-max-projection: true` and multiple channels are provided for nuclear and membrane stains with `segmentation-nuclear-channel:` and `segmentation-membrane-channel:`, the returned image stack will contain the maximum projection of the respective nuclear (recyze output channel 0) and membrane channels (recyze output channel 1).
 
 
 * Example `params.yml`:
 
 ``` yaml
 workflow:
-  segmentation-channel: 2 7 
+  segmentation-channel: 2 7
+  segmentation-recyze: true
+  segmentation: cellpose
+options:
+  cellpose: --pretrained_model cyto --chan 1 --chan2 0 --no_npy
+```
+
+* Example `params.yml` using max-projection - the input image is large, so `segmentation-recyze` is toggled. The segmentation in this case should be run on the maximum projection of two nuclear markers (channels 5 and 11), and 3 membrane markers (channels 2 3 and 7). The output from `Recyze` will be a two-channel `tif`, the nuclear channel max projection - channel 0, is provided to the `cyto` model with `--chan2 0`, and the membrane channel max  projection - channel 1, is provided to the model with `--chan 1`.
+
+``` yaml
+workflow:
+  segmentation-channel: 2 3 5 7 11
   segmentation-recyze: true
   segmentation: cellpose
+  segmentation-max-projection: true
+  segmentation-nuclear-channel: 5 11
+  segmentation-membrane-channel: 2 3 7
 options:
   cellpose: --pretrained_model cyto --chan 1 --chan2 0 --no_npy
 ```

diff --git a/lib/mcmicro/Flow.groovy b/lib/mcmicro/Flow.groovy
@@ -26,15 +26,16 @@ static def flowSegment(wfp) {
 
     // Valid start/stop steps in the mcmicro pipeline
     List mcsteps = [
-        "raw",              // Step 0
-        "illumination",     // Step 1
-        "registration",     // Step 2
-        "background",       // Step 3
-        "dearray",          // Step 4
-        "segmentation",     // Step 5
-        "watershed",        // Step 6
-        "quantification",   // Step 7
-        "downstream"        // Step 8
+        "staging",          // Step 0
+        "raw",              // Step 1
+        "illumination",     // Step 2
+        "registration",     // Step 3
+        "background",       // Step 4
+        "dearray",          // Step 5
+        "segmentation",     // Step 6
+        "watershed",        // Step 7
+        "quantification",   // Step 8
+        "downstream"        // Step 9
         ]
 
     // Identify starting and stopping indices
@@ -46,7 +47,7 @@ static def flowSegment(wfp) {
         throw new Exception("Unknown stopping step ${wfp['stop-at']}")
 
     // Advance segmentation -> watershed to ensure no dangling probability maps
-    if( idxStop == 5 ) idxStop = 6
+    if( idxStop == 6 ) idxStop = 7
 
     return [idxStart, idxStop]
 }
@@ -62,14 +63,15 @@ static def precomputed(wfp) {
 
     // Define whether a precomputed intermediate is relevant
     [
-        raw:                idxStart <= 2,
-        illumination:       idxStart == 2, 
-        registration:       idxStart == 3 || (idxStart == 4 && !wfp.background) || (idxStart > 4 && !wfp.tma && !wfp.background), // needed for background (3), tma if no background (4), everything else if both tma and background aren't specified
+        staging:            idxStart == 0,
+        raw:                idxStart <= 3 && idxStart > 0,
+        illumination:       idxStart == 3, 
+        registration:       idxStart == 4 || (idxStart == 5 && !wfp.background) || (idxStart > 5 && !wfp.tma && !wfp.background), // needed for background (3), tma if no background (4), everything else if both tma and background aren't specified
         background:         idxStart > 3 && wfp.background, // if background specified, required
-        dearray:            idxStart > 4 && wfp.tma, // if tma specified, required
-        'probability-maps': idxStart == 6,
-        segmentation:       idxStart == 7,
-        quantification:     idxStart == 8
+        dearray:            idxStart > 5 && wfp.tma, // if tma specified, required
+        'probability-maps': idxStart == 7,
+        segmentation:       idxStart == 8,
+        quantification:     idxStart == 9
     ]
 }
 
@@ -84,22 +86,24 @@ static def doirun(step, wfp) {
     def (idxStart, idxStop) = flowSegment(wfp)
 
     switch(step) {
+        case 'staging':
+            return(idxStart == 0 && idxStop >= 0)
         case 'illumination': 
-            return(idxStart <= 1 && idxStop >= 1)
+            return(idxStart <= 2 && idxStop >= 2 && wfp.illumination)
         case 'registration':
-            return(idxStart <= 2 && idxStop >= 2)
+            return(idxStart <= 3 && idxStop >= 3)
         case 'background':
-            return(idxStart <= 3 && idxStop >= 3 && wfp.background)
+            return(idxStart <= 4 && idxStop >= 4 && wfp.background)
         case 'dearray':
-            return(idxStart <= 4 && idxStop >= 4 && wfp.tma)
+            return(idxStart <= 5 && idxStop >= 5 && wfp.tma)
         case 'segmentation':
-            return(idxStart <= 5 && idxStop >= 5)
-        case 'watershed':
             return(idxStart <= 6 && idxStop >= 6)
-        case 'quantification':
+        case 'watershed':
             return(idxStart <= 7 && idxStop >= 7)
-        case 'downstream':
+        case 'quantification':
             return(idxStart <= 8 && idxStop >= 8)
+        case 'downstream':
+            return(idxStart <= 9 && idxStop >= 9)
         case 'viz':
             return(wfp.viz)
         default:

diff --git a/lib/mcmicro/Opts.groovy b/lib/mcmicro/Opts.groovy
@@ -138,6 +138,24 @@ static def validateWFParams(wfp, fns) {
             "segmentation-channel provided"
         throw new Exception(msg)
     }
+    if(!wfp['segmentation-max-projection'] && 
+      (wfp.containsKey('segmentation-nuclear-channel') || wfp.containsKey('segmentation-membrane-channel'))) {
+        String msg = "Multiple nuclear or membrane channels were requested " +
+            "but no maximum projection specification is provided. " +
+            "Either add the segmentation-max-projection parameter " +
+            "or only use segmentation-channel for channel selection."
+        throw new Exception(msg)
+    }
+    if(wfp['segmentation-max-projection'] &&
+      !(wfp.containsKey('segmentation-nuclear-channel') || wfp.containsKey('segmentation-membrane-channel'))) {
+        String msg = "Maximum projection specification provided but no " +
+            "nuclear or membrane channels defined. " +
+            "Either specify multiple nuclear (and membrane channels) with " +
+            "segmentation-nuclear-channel (and segmentation-membrane-channel) " +
+            "or exclude segmentation-max-projection and only use segmentation-channel " +
+            "for channel specification."
+        throw new Exception(msg)
+    }
 }
 
 /**
@@ -188,6 +206,18 @@ static def parseParams(gp, fns, fnw) {
     else
         mcp.modules['background'] = mcp.modules['background'][0]
 
+    // Select the staging module based on --staging-method
+    mcp.modules['staging'] = mcp.modules['staging'].findAll{
+        it.name == mcp.workflow['staging-method']
+    }
+    if(mcp.modules['staging'].size() < 1) {
+        String msg = "Unknown staging method " +
+            mcp.workflow['staging-method']
+        throw new Exception(msg)
+    }
+    else
+        mcp.modules['staging'] = mcp.modules['staging'][0]
+
     // Filter segmentation modules based on --segmentation
     mcp.modules['segmentation'] = mcp.modules['segmentation'].findAll{
         mcp.workflow.segmentation.contains(it.name)
@@ -243,10 +273,61 @@ static def moduleOpts(module, mcp) {
         copts = module.channel + ' ' + idx.join(' ')
       }
 
+    String ncopts = ''
+    if(wfp.containsKey('segmentation-nuclear-channel') &&
+        module.containsKey('nuclear-channel')) {
+
+        // Module spec must specify whether indexing is 0-based or 1-based
+        if(!module.containsKey('idxbase'))
+            error module.name + " spec in modules.yml is missing idxbase key"
+
+        // Identify the list of indices
+        List idx = wfp['segmentation-nuclear-channel'].toString().tokenize()
+
+        // Account for recyze, if appropriate
+        if(wfp['segmentation-recyze'])
+            idx = (1..idx.size()).collect{it}
+
+        // Account for 0-based indexing
+        if(module.idxbase == 0)
+            idx = idx.collect{"${(it as int)-1}"}
+
+        // S3segmenter will work with the first index only
+        if(module.name == 's3seg')
+            idx = idx[0..0]
+
+        ncopts = module.nuclear_channel + ' ' + idx.join(' ')
+      }
+
+    String mcopts = ''
+    if(wfp.containsKey('segmentation-membrane-channel') &&
+        module.containsKey('membrane-channel')) {
+
+        // Module spec must specify whether indexing is 0-based or 1-based
+        if(!module.containsKey('idxbase'))
+            error module.name + " spec in modules.yml is missing idxbase key"
+
+        // Identify the list of indices
+        List idx = wfp['segmentation-membrane-channel'].toString().tokenize()
+
+        // Account for recyze, if appropriate
+        if(wfp['segmentation-recyze'])
+            idx = (1..idx.size()).collect{it}
+
+        // Account for 0-based indexing
+        if(module.idxbase == 0)
+            idx = idx.collect{"${(it as int)-1}"}
+
+        // S3segmenter will work with the first index only
+        if(module.name == 's3seg')
+            idx = idx[0..0]
+
+        mcopts = module.membrane_channel + ' ' + idx.join(' ')
+      }
     // Identify all remaining module options
     String mopts = ''
     if(mcp.options.containsKey(module.name))
         mopts = mcp.options[module.name]
 
-    copts + ' ' + mopts
+    copts + ' ' + ncopts + ' ' + mcopts + ' ' + mopts
 }
diff --git a/lib/mcmicro/Util.groovy b/lib/mcmicro/Util.groovy
@@ -6,7 +6,11 @@ static def getSampleName(f, rawdir) {
     rel.contains('/') ? rel.split('/').head() : 
         rawdir.parent.getName()
 }
-
+static def getCycleNameFromDir(f, rawdir) {
+    // Resolve paths relative to the input project directory
+    String rel = rawdir.relativize(f).toString()
+    rel.split('/').head()
+}
 
 /**
  * Extracts a file ID as the first token before delim in the filename