Crop and bound-box interesting regions of an image (smart cropping) from saliency maps generated with SAM-LSTM-RESNET model
This repository contains the reference code written in Python 3 for generating saliency maps of images using Convolutional LSTM Resnet (implemented with TensorFlow 2 ) and smartly cropping images based on these maps.
pip install sam-lstm==1.0.0
- Tensorflow 2.9.0
- Scipy 1.9.3
- Scikit Image 0.19.3
- Numpy 1.23.4
- OpenCV 2.9.0
- CUDA (GPU)
Tips: Building up the environment on your local machine from scratch can take hours. If you want to get your hands on asap, then just use Google Colab with GPU runtime. It's free and all these libraries are preinstalled there.
Note It's mandatory to run the code on GPU runtime, otherwise it will fail. In a future release, the code will be made compatible with CPU runtime as well.
# Create a folder "samples" in the current directory
# Upload some images (.jpg, .png) in it
from sam_lstm import SalMap
SalMap.auto()
With just this two lines, sam_lstm
will compile the LSTM-based Saliency Attentive Convolutional model, generate raw saliency mapping in the maps folder, colored overlay mapping in the cmaps folder, bounding boxes over the images in the boxes and cropped ones in the crops folder. All of these will happen automatically. Just make sure you have .jpg/.jpeg/.png images in the samples folder.
from sal_lstm import SalMap
dataset = "dataset"
checkpoint = "/content/drive/MyDrive/Checkpoints/"
# Uncomment these lines if on GOOGLE COLAB
# import os
# from google.colab import drive
# drive.mount('/content/drive')
# if not os.path.exists(checkpoint):
# os.mkdir(checkpoint)
s = SalMap()
s.compile()
s.load_weights()
s.train(dataset_path, checkpoint, steps_per_epoch=1000)
With these line, you can start training the models using the Salicon 2017 dataset (which will get downloaded in the dataset
directory)
This work has been built on top of the following works:
- Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model by Cornia et. el. 2018
- Python 2 implementation (using Keras+Theano) by @marcellacornia. Check here
- Implement the source code on Python 3, using latest versions (by November 2022) of tensorflow and opencv. The original work by @marcellacornia was written with Python2 and used Theano backend for Keras, all of which are now unsupported by the community.
- Update the preprocessing stage to be compatible with Salicon 2017 dataset.
- Convert the work into an open source Python package readily installable from PyPa.
- Addition of the
cropping
module that allows for smart cropping of images. I have written a Descent from Hilltop algorithm for finding the bounding boxes by which the images are cropped.
- Training and validation dataset
- images: https://github.com/SheikSadi/SAM-LSTM-RESNET/releases/download/1.0.0/images.zip
- maps: https://github.com/SheikSadi/SAM-LSTM-RESNET/releases/download/1.0.0/maps.zip
- fixations: https://github.com/SheikSadi/SAM-LSTM-RESNET/releases/download/1.0.0/fixations.zip
- No Top Resnet50 weights (NCHW format)
- Pre-trained weights
- trained by @marcellacornia: https://github.com/SheikSadi/SAM-LSTM-RESNET/releases/download/1.0.0/sam-resnet_salicon_weights.pkl
- trained by @SheikSadi on Google Colab: https://github.com/SheikSadi/SAM-LSTM-RESNET/releases/download/1.0.0/sam-resnet-salicon.h5