Skip to content

Latest commit

 

History

History
175 lines (156 loc) · 12.9 KB

README.md

File metadata and controls

175 lines (156 loc) · 12.9 KB

AstroPath Pipeline

The AstroPath Pipeline was developed to process whole slide multiplex immunofluorescence data from microscope to database

Correspondence to: bgreen42@jhu.edu

1. Description

The AstroPath Pipeline was designed to automate the processing of whole slide multiplex immunoflourecence histopathology image data, taken by Akoya Biosciences’ Vectra imaging platform, from the microscope to database. The automated process begins after whole slide scans have been captured by the microscope and manually verified complete. Code is divided into three main stages; defined as hpf, slide, and sample level processing. In the hpf (or high powered field) processing stage, images are reorganized, corrected for camera\ imaging effects, and segmented\ phenotyped. Here images are mainly processed individually. In the next processing stage, aptly named slide the data is stiched together into a whole slide and the slides are annotated by a pathologist. Finally, slides across a cohort are corrected for batch to batch variation and loaded into a database. Here the image, cell, and annotation data of each whole slide image is linked its clinical information thus providing a completed sample. Code for each stage is organized into its own folder under astropath, with each folder containing a particular set of modules. Each module is organized separately in subfolders and described with linked documenation. An overview of the current pipeline can be seen below.

Figure1

2. Getting Started

2.1. Prerequisites

2.2. Instructions

2.2.1. Python Instructions

2.2.1.1. Environment setup

Especially on Windows, it is recommended to run python using an Anaconda distribution, which helps with installing dependencies. While most of the dependencies can just be installed with pip, others have C++ requirements that are significantly easier to set up with Anaconda.

Our recommendation is to download a Miniconda distribution. Once you install it, open the Anaconda powershell prompt and create an environment

conda create --name astropath python=3.8
conda activate astropath

You should activate the astropath environment in every new session before installing packages (whether through conda or pip) or before running code.

At least the following dependencies should be installed through Anaconda.

conda install -c conda-forge pyopencl gdal cvxpy numba 'ecos!=2.0.8' git jupyter

(pyopencl, gdal, and cvxpy have C++ dependencies. numba requires a specific numpy version, and installing it here avoids unpleasant interactions between conda and pip. ecos!=2.0.8 is a workaround for a bug in the ecos distribution on conda. git may or may not be needed, depending if you have it installed separately on your computer. jupyter is needed for deepcell.)

Many of the other dependencies can also be installed through Anaconda if you want, but we have found that they work just as well when installing with pip.

Note: GPU computation is supported in some Python modules through PyOpenCL. You will need to have third-party OpenCL drivers installed for any GPU you want to use. Any GPU built in 2011 or later supports OpenCL. OpenCL drivers can be downloaded here for Intel GPUs, here for AMD GPUs, and here for NVIDIA GPUs.

2.2.1.2. Code installation

To install the code, first check out the repository, enter its directory, activate the conda envorinment if you are using conda, and run

pip install .

If you want to continue developing the code after installing, run instead

pip install --editable .

You can also add optional dependencies by specifying them in brackets, as in pip install (--editable) .[gdal,deepcell]. The optional dependencies include:

  • deepcell - needed to run the DeepCell segmentation algorithm.
  • gdal - needed for polygon handling, which is used in the geom, geomcell, stitchmask, and csvscan steps.
  • nnunet - needed to run the nnU-Net segmentation algorithm
  • test - these packages are not needed for the actual AstroPath workflow but are used in various unit tests
  • vips - used in the zoom and deepzoom steps of the pipeline To install all optional dependencies, just specify [all].

Once the code is installed, you can run

import astropath

from any directory.

2.2.2. PowerShell Instructions

2.2.2.1. Launch using batch files

Most of the code written into powershell was designed to run automated as a background process launched by double clicking a batch file. The code monitors all projects defined in the astropath processing files and starts new tasks for slides when appropriate triggers take place. The set of batch files for modules launched this way can be found in the *\astropath\launch directory. Assuming slides are set up in the astropath format and the AstroPath processing directory is set up correctly, double clicking the file with the appropriate module name will initiate it.

2.2.2.2. Starting in Powershell

To run a module on particular slide, check out the repository and in a powershell console enter:

import-module *\astropath 

replacing the '`*' with the path to the repository.

Next use the launchmodule function to start a module as follows:

LaunchModule -mpath:<mpath> -module:<module name> -stringin:<module input>
  • <mapth>: the astropath processing directory
  • <module name>: module name to be launched, most modules launched in powershell are located in the hpfs or scans directories
  • <stringin>: dash separated list of arguements for a particular module For simplicity (understanding that most users will not have a great deal of comfort in powershell), one could launch a module such as vminform by invoking the following from a command line:
powershell -noprofile -command import-module *\astropath; LaunchModule -mpath:*\astropath_processing -module:vminform -stringin:<dpath>-<slideid>-<antibody>-<algorithm>-<inform version>

2.2.3. MATLAB Instructions

Check out\ download the github repository. In MATLAB, add the entire AstroPath Pipeline to the MATLAB path. The AstroPath Pipeline commands should then be available in MATLAB.

NOTE: For specific Python, MATLAB, cmd, or PowerShell commands of a particular module check the module or workflow instructions.

3. Contents

Credits

Created by: Benjamin Green1, Jeffrey S. Roskes4, Margaret Eminizer4, Richard Wilton4, Sigfredo Soto-Diaz2, Andrew Jorquera1, Sneha Berry2, Elizabeth Engle2, Nicolas Giraldo3, Peter Nguyen2, Tricia Cottrell3, Janis Taube1,2,3, and Alex Szalay4

Departments of 1Dermatology, 2Oncology, 3Pathology at Johns Hopkins University SOM, the Mark Center for Advanced Genomics and Imaging, the Sidney Kimmel Comprehensive Cancer Center, and the Bloomberg~Kimmel Institute for Cancer Immunotherapy at Johns Hopkins, Baltimore, MD, USA
Departments of 4Astronomy and Physics at Johns Hopkins University and IDIES, Baltimore, MD, USA

Individual Contributions: Benjamin Green: Conceptualization, Methodology, Software, Writing – Original Draft, Visualization Jeffrey S. Roskes: Conceptualization, Methodology, Software, Writing – Original Draft Margaret Eminizer: Conceptualization, Methodology, Software, Writing – Original Draft, Visualization Richard Wilton: Methodology, Software Sigfredo Soto-Diaz: Methodology, Software, Writing – Original Draft Andrew Jorquera: Methodology, Software, Writing – Original Draft Sneha Berry: Conceptualization, Validation, Visualization Liz Engle: Conceptualization, Resources, Validation Nicolas Giraldo-Castillo: Conceptualization Peter Nguyen: Conceptualization, Methodology Tricia Cottrell: Conceptualization, Validation, Writing – Review & Editing Janis Taube: Conceptualization, Resources, Supervision Alex Szalay: Conceptualization, Methodology, Validation, Software, Supervision