This repo is a simplified version of the model development framework for withoutbg.com.
The web app is a free service. Contact for the API pricing.
The architecture is a UNet with a refiner. The backbone can be changed with a SOTA network like ResNet50.
The refiner is sharpening the predicted alpha channel. It is a method mentioned in
the Deep Image Matting paper.
The loss is a weighted average of the compositional and alpha prediction losses.
Compositional loss is mentioned in the Deep Image Matting paper.
Alternatively, an adversarial loss from the discriminator can be added to the weighted average. Check
the AlphaGAN paper for more information.
The input is a 4 channel input: RGB image (3 channels) and inverse depth map (1 channel).
Depth Map: Because trimap is a human-in-the-loop solution, an inverse depth map is preferred. It is extracted by
using MiDaS model.
Inputs are augmented with Albumentations library.
Input image is a composited image. To composite an image, a suitable background is chosen for the foreground. For
example, highway or parking lot backgrounds might be chosen as car backgrounds.
To create the conda environment for the project:
conda env create -f environment.yml