RoIAlign, RoIPool and Non-Max Suppression implementation for PyTorch 1.0.
Code in this repository is a part of FAIR's official implementation of Mask R-CNN Benchmark, and support for PyTorch 1.0 and higher is patched by ruotianluo.
- pytorch >= 1.0.0
- torchvision>=0.2.0
- cython >= 0.29.2
- matplotlib
- numpy
- scipy
[NOTE] CUDA support is highly recommended because RoI pooling is not implemented for CPU yet.
python setup.py install
Import this library in your code:
from roi_util import ROIAlign, ROIPool, nms
See notebook/RoI_Util_API.ipynb for examples.
ROIAlign(output_size: tuple, spatial_scale: float, sampling_ratio: int)
Parameters:
- output_size - A tuple of 2 integers: expected size of the output feature map of the RoI.
- spatial_scale - A floating point number: relative size of the input feature map to the original input image.
Equal to
feature_map_width / original_image_width
. - sampling_ratio - An integer: the sampling ratio for RoI alignment.
Inputs: input, rois
- input - A
torch.Tensor
of shape(batch, num_channels, height, width)
: a batch of feature maps. - rois - A
torch.Tensor
of shape(total_num_rois, 5)
: the batch indices and coordinates of all RoIs. Each line of this tensor is an RoI with data (batch_index, x1, y1, x2, y2), since one feature map could correspond to several RoIs.x1, y1, x2, y2
denotes the coordinates of the top-left corner and the bottom-right corner of each RoI in the original image. Values ofx1
andx2
should be between0
andoriginal_image_width
, while values ofy1
andy2
should be between0
andoriginal_image_height
. If their values exceeds the range of original image size, the exceeded part would be padded with 0.
Outputs: output
- output - A
torch.Tensor
of shape (total_num_rois, num_channels, output_size[0], output_size[1]).
ROIPool(output_size: tuple, spatial_scale: float)
Parameters:
- output_size - A tuple of 2 integers: expected size of the output feature map of the RoI.
- spatial_scale - A floating point number: relative size of the input feature map to the original input image.
Equal to
feature_map_width / original_image_width
.
Inputs: input, rois
- input - A
torch.Tensor
of shape(batch, num_channels, height, width)
: a batch of feature maps. - rois - A
torch.Tensor
of shape(total_num_rois, 5)
: the batch indices and coordinates of all RoIs. Each line of this tensor is an RoI with data (batch_index, x1, y1, x2, y2), since one feature map could correspond to several RoIs.x1, y1, x2, y2
denotes the coordinates of the top-left corner and the bottom-right corner of each RoI in the original image. Values ofx1
andx2
should be between0
andoriginal_image_width
, while values ofy1
andy2
should be between0
andoriginal_image_height
. If their values exceeds the range of original image size, the exceeded part would be padded with 0.
Outputs: output
- output - A
torch.Tensor
of shape(total_num_rois, num_channels, output_size[0], output_size[1])
.
nms(dets: torch.Tensor, scores: torch.Tensor, overlap_threshold: float) -> torch.Tensor
Parameters:
- dets - A
torch.Tensor
of shape(num_detection, 4)
: top-left and bottom-right coordinates of all detected boxes. - scores - A
torch.Tensor
of shape(num_detection)
: detection scores of all the boxes. - overlap_threshold - A floating point number: the overlapping threshold. If two boxes have a higher IoU than the threshold, the box with lower score will be removed.
Returns:
- indices - A
torch.Tensor
of shape(num_filtered_detection)
: the indices of remaining boxes after filtering by non-max-suppression.
- Segmentation faults occurs on certain circumstances. The root cause is yet to be found.