Skip to content

Tutorial 01: Capture Screens

Lavi22 edited this page Sep 22, 2018 · 2 revisions

Getting visual inputs is a fundamental demand for training a deep learning model. In this tutorial, we will show you how to use VIVID Python API to capture images.

VIVID provides several types of visual input including scene, depth and segmentation based on AirSim. To get pictures from a camera, you can simply call simGetImages from a VividClient and pass a list of ImageRequest. The response of simGetImages is a list of byte arrays in png format corresponding to each of ImageRequest. For example:

response = client.simGetImages([ImageRequest(0, AirSimImageType.Scene)])

We briefly describe the parameters for ImageRequest, types of image and provide an example in the following sections.

ImageRequest

Parameters Description
int cameraId Camera ID
int image_type Type of image. Default: AirSimImageType.Scene
bool pixels_as_float True for converting pixel to floating point. Default: False
bool compress True for the compressed image. Default: False

Image Types

Type Description
Scene Visual appearance observed by the camera
DepthVis Each pixel is interpolated from black to white by the distance between the pixel and camera plane for depth visualization. Pure black: 0 meters. Pure white: 100 or over 100 meters
DepthPlanner Distance from camera plane to the pixel plane (parallel to camera plane)
DepthPerspective Distance from the camera viewpoint to the pixel
Segmentation Ground truth segmentation (by objects) from the screen

Subwindows from left to right in the example picture below are the captured images. ( DepthVis, Segmentation, and Scene respectively)

SubScreens

Python Example

A python example is provided here.