This project is an implementation of the paper "Perceptual Losses for Real-Time Style Transfer and Super-Resolution" by Johnson et al. The main idea is to train a feed forward neural network for style transfer using perceptual losses, enabling real-time style transfer.
Neural style transfer was initially introduced in the paper by Gatys et al., which used an optimization based approach to generate stylized images. While this method produces high-quality results, it is computationally expensive and slow. Pytorch has a tutorial for implementing the Gatys' Optimzation Based method, check this out Pytorch Tutorial.
This implementation achieves real-time neural style transfer by:
- Training a feed-forward transformation network to directly generate stylized images
- Using perceptual losses computed from a pre-trained VGG-16 network
- Optimizing both content and style representations simultaneously
- Enabling fast inference with a single forward pass
The model architecture follows Johnson et al.'s design, featuring residual blocks and upsampling layers. The perceptual loss combines:
- Content loss: MSE between feature representations of content and stylized images
- Style loss: MSE between Gram matrices of feature maps
For mathematical details and implementation insights, check out my notebook included in this repository.
git clone https://github.com/emanalytic/Perceptual-Losses-Neural-Style-Transfer.git
cd Perceptual-Losses-Neural-Style-Transfer
pip install -r requirements.txt
For this project, I used an NVIDIA GeForce GTX 1650 GPU with 4GB VRAM to train on a smaller patch of about 40K images. However, the full training was done on 82K images from the COCO dataset. If you don’t have a GPU, you can use Kaggle’s Tesla GPUs, which come with 16GB of VRAM. They’re pretty fast, and the best part is that you don’t need to download the datasets locally it’s all handled on the platform!
Ensure you install the version of PyTorch compatible with your GPU. You can find the correct version for your setup by visiting the PyTorch installation page. I recommend using CUDA 12.1 or higher for better performance.
-
Download the COCO dataset from this link and place it in the
data/content_dir
directory.(Also change the paths inconfig.ini
) -
Use the
train.py
script to train the style transfer model. -
Run the following command:
python train.py
- If you don't want to train the model yourself, you can use the pretrained model available in this repository. You can download the checkpoints Pretrained Model. And after loading them you can just simply test the different version of model.
Below are some example results of the style transfer model:
Content Image | Style Image | Stylized Image |
---|---|---|
train.py
: Script to train the style transfer model.vgg.py
: Script for hook manager to get featuresmodel.py
: Model Implementation from pytorch example repoloss.py
: Perceptual Loss Implementationneural_style_transfer.ipynb
: Jupyter notebook explaining the concepts and code in detail.data/
: Directory for storing the content images and style image.utils/
: For configuration and utility functions
- "Perceptual Losses for Real-Time Style Transfer and Super-Resolution"
- Gatys et al.'s Neural Style Transfer
- PyTorch Neural Style Transfer Tutorial
- Pytorch Example
- Helpful Article