Skip to content

Official code for "DermSynth3D: Synthesis of in-the-wild Annotated Dermatology Images". A data generation pipeline for creating photorealistic in-the-wild synthetic dermatalogical data with rich multi-task annotations for various skin-analysis tasks.

License

Notifications You must be signed in to change notification settings

sfu-mial/DermSynth3D

Repository files navigation

DermSynth3D

GPLv3 arXiv DOI request dataset Video Hugging Face Spaces

📜 This is the official code repository for DermSynth3D.

📢 DermSynth3D is now accepted to MedIA 🎉.

🤗NEW Try out the DermSynth3D web demo here.

PDF thumbnail

📺 Check out the video abstract for this work: Video Thumbnail

TL;DR

A data generation pipeline for creating photorealistic in-the-wild synthetic dermatological data with rich annotations such as semantic segmentation masks, depth maps, and bounding boxes for various skin analysis tasks.

main pipeline

The figure shows the DermSynth3D computational pipeline where 2D segmented skin conditions are blended into the texture image of a 3D mesh on locations outside of the hair and clothing regions. After blending, 2D views of the mesh are rendered with a variety of camera viewpoints and lighting conditions and combined with background images to create a synthetic dermatology dataset.

Motivation

In recent years, deep learning (DL) has shown great potential in the field of dermatological image analysis. However, existing datasets in this domain have significant limitations, including a small number of image samples, limited disease conditions, insufficient annotations, and non-standardized image acquisitions. To address these shortcomings, we propose a novel framework called ${DermSynth3D}$.

${DermSynth3D}$ blends skin disease patterns onto 3D textured meshes of human subjects using a differentiable renderer and generates 2D images from various camera viewpoints under chosen lighting conditions in diverse background scenes. Our method adheres to top-down rules that constrain the blending and rendering process to create 2D images with skin conditions that mimic in-the-wild acquisitions, resulting in more meaningful results. The framework generates photo-realistic 2D dermoscopy images and the corresponding dense annotations for semantic segmentation of the skin, skin conditions, body parts, bounding boxes around lesions, depth maps, and other 3D scene parameters, such as camera position and lighting conditions. ${DermSynth3D}$ allows for the creation of custom datasets for various dermatology tasks.

Repository layout

DermSynth3D/
┣ assets/                      # assets for the README
┣ configs/                     # YAML config files to run the pipeline
┣ logs/                        # experiment logs are saved here (auto created)
┣ out/                         # the checkpoints are saved here (auto created)
┣ data/                        # directory to store the data
┃  ┣ ...                       # detailed instructions in the dataset.md
┣ dermsynth3d/                 #
┃  ┣ datasets/                 # class definitions for the datasets
┃  ┣ deepblend/                # code for deep blending
┃  ┣ losses/                   # loss functions
┃  ┣ models/                   # model definitions
┃  ┣ tools/                    # wrappers for synthetic data generation
┃  ┗ utils/                    # helper functions
┣ notebooks/                   # demo notebooks for the pipeline
┣ scripts/                     # scripts for traning and evaluation
┗ skin3d/                      # external module

Table of Contents

Installation

using conda

git clone --recurse-submodules https://github.com/sfu-mial/DermSynth3D.git
cd DermSynth3D
conda env create -f dermsynth3d.yml
conda activate dermsynth3d

using Docker

# Build the container in the root dir
docker build -t dermsynth3d --build-arg USER=$USER --build-arg UID=$(id -u) --build-arg GID=$(id -g) -f Dockerfile .
# Run the container in interactive mode for using DermSynth3D
# See 3. How to use DermSynth3D
docker run --gpus all --user=root --runtime=nvidia -it --rm -v /path/to/downloaded/data:/data dermsynth3d

We provide some pre-built docker images, which can be be used as well to:

# pull this latest docker image with the latest code
# you need to prepare the data following the instructions below
docker pull sinashish/dermsynth3d:latest

# pull this image for trying out the code with demo data i.e. lesions and meshes
docker pull sinashish/dermsynth3d:demo_w_code

# Run the container in interactive GPU mode for generating data and training models
# mount the data directory to the container
docker run --gpus all -it --user=root --runtime=nvidia --rm -v /path/to/downloaded/data:/data dermsynth3d:<tag name>

NOTE: The code has been tested on Ubuntu 20.04 with CUDA 11.1, python 3.8, pytorch 1.10.0, and pytorch3d 0.7.2, and we don't know if it will work on CPU.

If you face any issues installing pytorch3d, please refer to their installation guide or this issue link.

Datasets

Follow the instructions below to download the datasets for generating the synthetic data and training models for various tasks. All the datasets should be downloaded and placed in the data directory.

The folder structure of data directory should be as follows:

DermSynth3D/
┣ ...						   		          # other source code
┣ data/                        	# directory to store the data
┃  ┣ 3dbodytex-1.1-highres   		# data for 3DBodyTex.v1 3d models and texture maps
┃  ┣ fitzpatrick17k/
┃  ┃  ┣ data/               		# Fitzpatrick17k images
┃  ┃  ┗ annotations/        		# annotations for Fitzpatrick17k lesions
┃  ┣ ph2/
┃  ┃  ┣ images/               	# PH2 images
┃  ┃  ┗ labels/               	# PH2 annotations
┃  ┣ dermofit/ 						      # Dermofit dataset
┃  ┃  ┣ images/               	# Dermofit images
┃  ┃  ┗ targets/               	# Dermofit annotations
┃  ┣ FUSeg/
┃  ┃  ┣ train/               		# training set with images/labels for FUSeg
┃  ┃  ┣ validation/             # val set with images/labels for FUSeg
┃  ┃  ┗ test/               		# test set with images/labels for FUSeg
┃  ┣ Pratheepan_Dataset/
┃  ┃  ┣ FacePhoto/              # images from Pratheepan dataset
┃  ┃  ┗ GroundT_FacePhoto/  	 	# annotations
┃  ┣ lesions/                   # keep the non-skin masks for 3DBodyTex.v1 meshes here
┃  ┣ annotations/               # segmentation masks for Annotated Fitzpatrick17k lesions
┃  ┣ bodytex_anatomy_labels/ 		# per-vertex labels for anatomy of 3DBodyTex.v1 meshes
┃  ┣ background/               	# keep the background scenes for rendering here
┃  ┗ synth_data/			   		    # the generated synthetic data will be stored here
    	┣ train/               		# training set with images/labels for training on synthetic data<val/test>/ 			 	    # val and test set with images/labels for training on synthetic data

The datasets used in this work can be broadly categorized into data required for blending and data necessary for evaluation.

Data for Blending

Download 3DBodyTex.v1 meshes

3dbodytex sample

A few examples of raw 3D scans in sports-clothing from the 3DBodyTex.v1 dataset showing a wide range of body shapes, pose, skin-tone, and gender.

The 3DBodyTex.v1 dataset can be downloaded from here.

3DBodyTex.v1 contains the meshes and texture images used in this work and can be downloaded from the external site linked above (after accepting a license agreement).

NOTE: These textured meshes are needed to run the code to generate the data.

We provide the non-skin texture maps annotations for 2 meshes: 006-f-run and 221-m-u. Hence, to generate the data, make sure to get the .obj files for these two meshes and place them in data/3dbodytex-1.1-highres before excecuting scripts/gen_data.py.

After accepting the licence, download and unzip the data in ./data/.

Download the 3DBodyTex.v1 annotations

Non-skin texture maps Anatomy labels

We provide the non-skin texture map ($T_{nonskin}$) annotations for 215 meshes from the 3DBodyTex.v1 dataset here.

We provide the per-vertex labels for anatomical parts of the 3DBodyTex.v1 meshes obtained by fitting SCAPE template body model here.

A sample texture image showing the annotations for non-skin regions.

A few examples of the scans showing the 7 anatomy labels.

The folders are organised with the same IDs as the meshes in 3DBodyTex.v1 dataset.

NOTE: To download the the 3DBodyTex.v1 annotations with the links referred above, you would need to request access to the 3DBodyTex.DermSynth dataset by following the instructions on this link.

Download the Fitzpatrick17k dataset

fitz_annot_fig
An illustration showing lesions from the Fitzpatrick17k dataset in the top row, and it's corresponding manually segmented lesion annotation in the bottom row.

We used the skin conditions from Fitzpatrick17k. See their instructions to get access to the Fitzpatrick17k images. We provide the raw images for the Fitzpatrick17k dataset here.

After downloading the dataset, unzip the dataset:

unzip fitzpatrick17k.zip -d data/fitzpatrick17k/

We provide a few samples of the densely annotated lesion masks from the Fitzpatrick17k dataset within this repository under the data directory.

More of such annotations can be downloaded from here.

Download the Background Scenes

bg_scenes

A few examples of the background scenes used for rendering the synthetic data.

Although you can use any scenes as background for generating the random views of the lesioned-meshes, we used SceneNet RGB-D for the background IndoorScenes. Specifically, we used this split, and sampled 3000 images from it.

For convenience, the background scenes we used to generate the ssynthetic dataset can be downloaded from here.

Data For Training

Download the FUSeg dataset

fu_seg

A few examples from the FUSeg dataset showing the images in the top row and, it's corresponding segmentation mask in the bottom row.

The Foot Ulcer Segmentation Challenge (FUSeg) dataset is available to download from their official repository. Download and unpack the dataset at data/FUSeg/, maintaining the Folder Structure shown above.

For simplicity, we mirror the FUSeg dataset here.

Download the Pratheepan dataset

prath

A few examples from the Pratheepan dataset showing the images and it's corresponding segmentation mask, in the top and bottom row respectively.

The Pratheepan dataset is available to download from their official website. The images and the corresponding ground truth masks are available in a ZIP file hosted on Google Drive. Download and unpack the dataset at data/Pratheepan_Dataset/.

Download the PH2 dataset

ph2

A few examples from the PH2 dataset showing a lesion and it's corresponding segmentation mask, in the top and bottom row respectively.

The PH2 dataset can be downloaded from the official ADDI Project website. Download and unpack the dataset at data/ph2/, maintaining the Folder Structure shown below.

Download the DermoFit dataset

dermo

An illustration of a few samples from the DermoFit dataset showing the skin lesions and it's corresponding binary mask, in the top and bottom row respectively.

The DermoFit dataset is available through a paid perpetual academic license from the University of Edinburgh. Please access the dataset following the instructions for the DermoFit Image Library and unpack it at data/dermofit/, maintaining the Folder Structure shown above.

Creating the Synthetic dataset

synthetic data

Generated synthetic images of multiple subjects across a range of skin tones in various skin conditions, background scene, lighting, and viewpoints.

For convenience, we provide the generated synthetic data we used in this work for various downstream tasks here.

If you want to train your models on a different split of the synthetic data, you can download a dataset generated by blending lesions on 26 3DBodyTex scans from here. To prepare the synthetic dataset for training. Sample the images, and targets from the path where you saved this dataset and then organise them into train/val.

NOTE: To download the synthetic 3DBodyTex.DermSynth dataset referred in the links above, you would need to request access by following the instructions on this link.

Alternatively, you can use the provided script scripts/prep_data.py to create it.

Even better, you can generate your own dataset, by following the instructions here.

How to Use DermSynth3D

Generating Synthetic Dataset

annots

A few examples of annotated data synthesized using DermSynth3D. The rows from top to bottom show respectively: the rendered images with blended skin conditions, bounding boxes around the lesions, GT semantic segmentation masks, grouped anatomical labels, and the monocular depth maps produced by the renderer.

Before running any code to synthesize a densely annotated data as shown above, make sure that you have downloaded the data necessary for blending as mentioned in datasets and folder structure is as described above. If your folder structure is different from ours, then update the paths, such as bodytex_dir, annot_dir, etc., accordingly in configs/blend.yaml.

Now, to generate the synthetic data with the default parameters, simply run the following command to generate 2000 views for a specified mesh:

python -u scripts/gen_data.py

To change the blending or synthesis parameters only, run using:

# Use python scripts/gen_data.py -h for full list of arguments
python -u scripts/gen_data.py --lr <learning rate> \
            -m <mesh_name> \
            -s <path to save the views> \
            -ps <skin threshold> \
            -i <blending iterations> \
            -v <number of views> \
            -n <number of lesions per mesh>

Feel free to play around with other random parameter in configs/blend.yaml to control lighting, material and view points.

Post-Process Renderings with Unity

We use Pytorch3D as our choice of differential renderer to generate synthetic data. However, Pytorch3D is not a Physically Based Renderer (PBR) and hence, the renderings are not photorealistic or may not look photorealistic. To achieve photorealistic renderings, we use Unity to post-process the renderings obtained from Pytorch3D.

Click to see the a visual comparison of the renderings obtained from Pytorch3D and Unity.

renderer_comp

A visual comparison of the renderings obtained from Pytorch3D and Unity (Point Lights and Mixed Lighting).

NOTE: This is an optional step. If you are not interested in creating photorealistic renderings, you can skip this step and use the renderings obtained from Pytorch3D directly. We didn't observe a significant difference in the performance of the models trained on the renderings obtained from Pytorch3D and Unity.

Follow the detailed instructions outlined here to create photorealistic renderings using Unity. Alternatively, download the renders that we created using Unity here.

Preparing Dataset for Experiments

After creating the syntheic dataset in the previous step, it is now the time to evaluate the utility of the dataset on some real-world tasks.

Before, you start with any experiments, ideally you would want to organize the generated data into train/val/test sets. We provide a utility script to do the same:

python scripts/prep_data.py

You can look at scripts/prep_data.py for more details.

Cite

If you find this work useful or use any part of the code in this repo, please cite our paper:

@article{sinha2024dermsynth3d,
  title={DermSynth3D: Synthesis of in-the-wild annotated dermatology images},
  author={Sinha, Ashish and Kawahara, Jeremy and Pakzad, Arezou and Abhishek, Kumar and Ruthven, Matthieu and Ghorbel, Enjie and Kacem, Anis and Aouada, Djamila and Hamarneh, Ghassan},
  journal={Medical Image Analysis},
  pages={103145},
  year={2024},
  publisher={Elsevier}
}

Demo Notebooks for Dermatology Tasks

Qualitative Results

Qualitative results for (a) foot ulcer bounding box detection on FUSeg dataset, (b) multi-class segmentation (lesions,skin, and background) and in-the-wild body part prediction, (c) skin segmentation and body part prediction on Pratheepan dataset, and (d) multi-class segmentation (lesions, skin, and background) on dermoscopy images from PH2 dataset.

Lesion Segmentation

Note: Update the paths to relevant datasets in configs/train_mix.yaml.

To train a lesion segmentation model with default parameters, on a combination of Synthetic and Real Data, simply run:

python -u scripts/train_mix_seg.py

Play around with the following parameters for a combinatorial mix of datasets.

real_ratio: 0.5                 # fraction of real images to be used from real dataset
real_batch_ratio: 0.5           # fraction of real samples in each batch
pretrain: True                  # use pretrained DeepLabV3 weights
mode: 1.0                       # Fraction of the number of synthetic images to be used for training

You can also look at this notebook for a quick overview for training lesion segmention model.

For inference of pre-trained models/checkpoints, look at this notebook.

Multi-Task Prediction

We also train a multi-task model for predicting lesion, anatomy and depth, and evaluate it on multiple datasets.

For a quick overview of multi-task prediction task, checkout this notebook.

For performing inference on your trained models for this task. First update the paths in configs/multitask.yaml. Then run:

python -u scripts/infer_multi_task.py

Lesion Detection

For a quick overview for training lesion detection models, please have a look at this notebook.

For doing a quick inference using the pre-trained detection models/ checkpoints, have a look at this notebook.

Acknowledgements

We are thankful to the authors of Skin3D for making their code and data public for the task of lesion detection on 3DBodyTex.v1 dataset.