-
Notifications
You must be signed in to change notification settings - Fork 2
Input Data
Our pipeline uses overlapping drone imagery taken by a Phantom DJI Drone. There are several different ways to provide this input data to the pipeline. For this example, we will just illustrate the most simple directory structure.
Note first that the pipeline expects that you have drone images from each region surveyed by the drone in separate directories like this
flower_map/
├── config.yml
├── data/
| ├── models/
| ├── samples.tsv
| ├── region1/
| | ├── DJI_001.JPG
| | ├── DJI_002.JPG
| | ├── DJI_003.JPG
| ├── region2/
| | ├── DJI_001.PNG
| | ├── DJI_002.PNG
| | ├── DJI_003.PNG
| ├── region3/
| | ├── DJI_001.JPG
| | ├── DJI_002.PNG
| | ├── DJI_003.JPG
... (not shown: the rest of the files in this repository)
We've placed all of our data in a git-ignored data/
folder within the project root. If your data exists in a separate place on your filesystem, you can symlink it to the data/
directory or symlink the data/
directory itself. You may even choose not to have a data/
directory at all. The only requirement is that each region must have its own directory of drone image files.
Inside the data/
directory, we created a samples.tsv
file describing the paths to these datasets:
region1 data/region1
region2 data/region2
region3 data/region3 .PNG
The samples.tsv
file has three tab-separated columns and a line for each dataset that you'd like to analyze. The first column is a unique identifier you assign to the dataset. This is used by the pipeline when it creates its output, so you should avoid using spaces in your unique identifiers. The second column is the path to the dataset from the root of the project directory.
The third column is optional and denotes the extension of the image files in the dataset's directory. If this is not specified, the most commonly used extension will be used. In our example, the pipeline would default to using .JPG
for region3
, since data/region3
has only one .PNG
file. But by specifying .PNG
in our samples.tsv
file, we are instructing the pipeline to use only the .PNG
files in data/region3
.
Once you're done constructing your samples.tsv
file, you should specify the path to it in your config.yml
configuration file.
It is best to specify all of your datasets in the samples.tsv
file even if you only plan to use a few of them at first. A separate configuration option in config.yml
called SAMP_NAMES
allows you to use only a subset of the datasets at once.
SAMP_NAMES
should be set to a list of dataset IDs like this
SAMP_NAMES: [region1, region3]
If you'd like to use all of the datasets in the samples.tsv
file, set SAMP_NAMES
to a falsey value.