The AstroPath Pipeline was developed to process whole slide multiplex immunofluorescence data from microscope to database
Correspondence to: bgreen42@jhu.edu
The AstroPath Pipeline was designed to automate the processing of whole slide multiplex immunoflourecence histopathology image data, taken by Akoya Biosciences’ Vectra imaging platform, from the microscope to database. The automated process begins after whole slide scans have been captured by the microscope and manually verified complete. Code is divided into three main stages; defined as hpf
, slide
, and sample
level processing. In the hpf
(or high powered field) processing stage, images are reorganized, corrected for camera\ imaging effects, and segmented\ phenotyped. Here images are mainly processed individually. In the next processing stage, aptly named slide
the data is stiched together into a whole slide and the slides are annotated by a pathologist. Finally, slides across a cohort are corrected for batch to batch variation and loaded into a database. Here the image, cell, and annotation data of each whole slide image is linked its clinical information thus providing a completed sample
. Code for each stage is organized into its own folder under astropath
, with each folder containing a particular set of modules. Each module is organized separately in subfolders and described with linked documenation. An overview of the current pipeline can be seen below.
Especially on Windows, it is recommended to run python using an Anaconda distribution, which helps with installing dependencies. While most of the dependencies can just be installed with pip, others have C++ requirements that are significantly easier to set up with Anaconda.
Our recommendation is to download a Miniconda distribution. Once you install it, open the Anaconda powershell prompt and create an environment
conda create --name astropath python=3.8
conda activate astropath
You should activate the astropath
environment in every new session before
installing packages (whether through conda or pip) or before running code.
At least the following dependencies should be installed through Anaconda.
conda install -c conda-forge pyopencl gdal cvxpy numba 'ecos!=2.0.8' git jupyter
(pyopencl
, gdal
, and cvxpy
have C++ dependencies.
numba
requires a specific numpy version, and installing it here
avoids unpleasant interactions between conda and pip.
ecos!=2.0.8
is a workaround for a bug
in the ecos distribution on conda.
git
may or may not be needed, depending if you
have it installed separately on your computer.
jupyter
is needed for deepcell.)
Many of the other dependencies can also be installed through Anaconda if you want, but we have found that they work just as well when installing with pip.
Note: GPU computation is supported in some Python modules through PyOpenCL. You will need to have third-party OpenCL drivers installed for any GPU you want to use. Any GPU built in 2011 or later supports OpenCL. OpenCL drivers can be downloaded here for Intel GPUs, here for AMD GPUs, and here for NVIDIA GPUs.
To install the code, first check out the repository, enter its directory, activate the conda envorinment if you are using conda, and run
pip install .
If you want to continue developing the code after installing, run instead
pip install --editable .
You can also add optional dependencies by specifying them in brackets,
as in pip install (--editable) .[gdal,deepcell]
.
The optional dependencies include:
deepcell
- needed to run the DeepCell segmentation algorithm.gdal
- needed for polygon handling, which is used in thegeom
,geomcell
,stitchmask
, andcsvscan
steps.nnunet
- needed to run the nnU-Net segmentation algorithmtest
- these packages are not needed for the actual AstroPath workflow but are used in various unit testsvips
- used in thezoom
anddeepzoom
steps of the pipeline To install all optional dependencies, just specify[all]
.
Once the code is installed, you can run
import astropath
from any directory.
Most of the code written into powershell was designed to run automated as a background process launched by double clicking a batch file. The code monitors all projects defined in the astropath processing files and starts new tasks for slides when appropriate triggers take place. The set of batch files for modules launched this way can be found in the *\astropath\launch directory. Assuming slides are set up in the astropath format and the AstroPath processing directory is set up correctly, double clicking the file with the appropriate module name will initiate it.
To run a module on particular slide, check out the repository and in a powershell console enter:
import-module *\astropath
replacing the '`*' with the path to the repository.
Next use the launchmodule function to start a module as follows:
LaunchModule -mpath:<mpath> -module:<module name> -stringin:<module input>
<mapth>
: the astropath processing directory<module name>
: module name to be launched, most modules launched in powershell are located in the hpfs or scans directories<stringin>
: dash separated list of arguements for a particular module For simplicity (understanding that most users will not have a great deal of comfort in powershell), one could launch a module such as vminform by invoking the following from a command line:
powershell -noprofile -command import-module *\astropath; LaunchModule -mpath:*\astropath_processing -module:vminform -stringin:<dpath>-<slideid>-<antibody>-<algorithm>-<inform version>
Check out\ download the github repository. In MATLAB, add the entire AstroPath Pipeline to the MATLAB path. The AstroPath Pipeline commands should then be available in MATLAB.
NOTE: For specific Python, MATLAB, cmd, or PowerShell commands of a particular module check the module or workflow instructions.
- 1. Description
- 2. Getting Started
- 3. Contents
- 4. Scanning Slides (scans)
- 5. HPF Processing (hpfs)
- 6. Slide Processing (slides)
- 7. Sample Processing (samples)
- 8. Powershell
Created by: Benjamin Green1, Jeffrey S. Roskes4, Margaret Eminizer4, Richard Wilton4, Sigfredo Soto-Diaz2, Andrew Jorquera1, Sneha Berry2, Elizabeth Engle2, Nicolas Giraldo3, Peter Nguyen2, Tricia Cottrell3, Janis Taube1,2,3, and Alex Szalay4
Individual Contributions: Benjamin Green: Conceptualization, Methodology, Software, Writing – Original Draft, Visualization Jeffrey S. Roskes: Conceptualization, Methodology, Software, Writing – Original Draft Margaret Eminizer: Conceptualization, Methodology, Software, Writing – Original Draft, Visualization Richard Wilton: Methodology, Software Sigfredo Soto-Diaz: Methodology, Software, Writing – Original Draft Andrew Jorquera: Methodology, Software, Writing – Original Draft Sneha Berry: Conceptualization, Validation, Visualization Liz Engle: Conceptualization, Resources, Validation Nicolas Giraldo-Castillo: Conceptualization Peter Nguyen: Conceptualization, Methodology Tricia Cottrell: Conceptualization, Validation, Writing – Review & Editing Janis Taube: Conceptualization, Resources, Supervision Alex Szalay: Conceptualization, Methodology, Validation, Software, Supervision