Skip to content

CellProfiling/HPA-Cell-Segmentation

Repository files navigation

HPA-Cell-Segmentation

Tools for running cell segmentation using pretrained U-Nets.

  1. The cellsegmentator module which implements a class CellSegmentator to make running segmentation from Python code easier.
  2. A utils module which contain a couple of post-processing functions that is good to use when working with images from the Human Protein Cell Atlas.

Installation

The following steps will install hpacellseg to your local pip installations folder together with all necessary dependencies.

  • git clone https://github.com/CellProfiling/HPA-Cell-Segmentation.git
  • cd HPA-Cell-Segmentation
  • sh install.sh

You can verify the installation has worked by running

python -c 'import hpacellseg'

in a terminal.

If you want to use anaconda/miniconda (env tested on a100 gpu):

  • conda env create -f environment.yml
  • conda activate hpacellseg

Note that it can be nice to run the pip steps in the install script yourself, trying them with the –dry-run flag to make sure they won’t overwrite the desired torch and numpy versions. The torch and cuda versions should be changed for the needed setup, we just can’t have torch higher than 1.9.1. Using conda search pytorch you can find the right property string to add after the version number in the environment yaml.

Usage

CellSegmentator class

The CellSegmentator is made for programmaticly iterating through a list of images and returning the segmentations.

It is located in hpacellseg.cellsegmentator.

Arguments

  • nuclei_model

    This should be a string containing the path to the nuclei-model weights. If the weights do not exist at the path, they will be downloaded to it.

    Defaults to ./nuclei_model.pth.

  • cell_model

    This should be a string containing the path to the cell-model weights. If the weights do not exist at the path, they will be downloaded to it.

    Defaults to ./cell_model.pth.

  • scale_factor

    This value determines how much the images should be scaled before being fed to the models. For HPA Cell images, a value of 0.25 is good.

    Defaults to 0.25.

  • device

    Inform Torch which device to put the model on. Valid values are ‘cpu’ or ‘cuda’ or pointed cuda device like ‘cuda:0’.

    Defaults to cuda.

  • padding

    If True, add some padding before feeding the images to the neural networks. This is not required but can make segmentations, especially cell segmentations, more accurate.

    Defaults to False.

    Note: If you have issues running the segmentation due to image dimensions, setting padding to True may help.

  • multi_channel_model

    If True, use the pretrained three-channel version of the model. Having this set to True gives you better cell segmentations but requires you to give the model endoplasmic reticulum images as part of the cell segmentation. Otherwise, the version trained with only two channels, microtubules and nuclei, will be used.

    Defaults to True.

Methods

The two segmentation functions are

  • pred_nuclei

    The function takes a list of image arrays or a list of string paths to images. If the image arrays are 3 channels, the nuclei should be in the third (blue) channel.

    Returns a list of the neural network outputs of the nuclei segmentation. The images are on the format (3, H, W). The three channels are as follows [<Unused>, touching-nuclei, Nuclei-segmentation].

  • pred_cells

    The function takes a list of three lists as input. The lists should contain either image arrays or string paths, in the order of microtubules, endoplasmic reticulum, and nuclei.

    Returns a list of the neural network outputs of the cell segmentations. The images are on the format (3, H, W). The three channels for the cell segmentation are as follows [<Unused>, touching-cells, Cell-segmentation].

Note that both these functions assume that all input images are of the same shape!!

Post processing

The two available post-processing functions are located in the hpacellseg.utils module. They are:

  • label_nuclei

    Input with the nuclei prediction for an image. Returns the labeled nuclei mask array. 0s in the array indicate background while all other numbers 1-n indicate which cell is in that spot.

  • label_cell

    Input with the nuclei and cell prediction for an image. Returns the labeled nuclei and cell mask arrays as a tuple. As with label_nuclei, the background is 0s and other numbers indicates which cell is there. The same cell will have the same number in both arrays.

Example usage

import hpacellseg.cellsegmentator as cellsegmentator
from hpacellseg.utils import label_cell, label_nuclei
from imageio import imwrite

# Assuming that there are images in the current folder with the
# following names.
images = [
    ["microtubules_one.tif", "microtubules_two.tif"],
    ["endoplasmic_reticulum_one.tif", "endoplasmic_reticulum_two.tif"],
    ["nuclei_one.tif", "nuclei_two.tif"]
]
NUC_MODEL = "./nuclei-model.pth"
CELL_MODEL = "./cell-model.pth"
segmentator = cellsegmentator.CellSegmentator(
    NUC_MODEL,
    CELL_MODEL,
    scale_factor=0.25,
    device="cuda",
    # NOTE: setting padding=True seems to solve most issues that have been encountered
    #       during our single cell Kaggle challenge.
    padding=False,
    multi_channel_model=True,
)

# For nuclei: taking in nuclei channels as inputs
nuc_segmentations = segmentator.pred_nuclei(images[2])

# For full cells: taking in 3 channels as inputs
cell_segmentations = segmentator.pred_cells(images)

# post-processing nuclei mask
nuclei_mask = label_nuclei(nuc_segmentations[0])

# post-processing nuclei and cell mask
for i, (nuc_segmentation, cell_segmentation) in enumerate(zip(nuc_segmentations, cell_segmentations)):
    nuclei_mask, cell_mask = label_cell(nuc_segmentation, cell_segmentation)
    # Save these masks in local folder
    imwrite(f"nucleimask_{i}.png", nuclei_mask)
    imwrite(f"cellmask_{i}.png", cell_mask)