Skip to content

Input Data

Arya Massarat edited this page Sep 4, 2020 · 15 revisions

Our pipeline uses overlapping drone imagery taken by a Phantom DJI Drone. There are several different ways to provide this input data to the pipeline. As a preliminary example, we will illustrate the most simple directory structure.

Note first that the pipeline expects that you have drone images from each region surveyed by the drone in separate directories like this

flower_map/
├── config.yml
├── data/
|   ├── models/
|   ├── samples.tsv
|   ├── region1/
|   |   ├── DJI_001.JPG
|   |   ├── DJI_002.JPG
|   |   ├── DJI_003.JPG
|   ├── region2/
|   |   ├── DJI_001.PNG
|   |   ├── DJI_002.PNG
|   |   ├── DJI_003.PNG
|   ├── region3/
|   |   ├── DJI_001.JPG
|   |   ├── DJI_002.PNG
|   |   ├── DJI_003.JPG
├── envs/
├── LICENSE
├── metashape.lic
├── out/
├── README.md
├── run.bash
├── scripts/
├── Snakefile

We've placed all of our data in a data/ folder within the project root. If you're data exists in a separate place on your filesystem, you can symlink it to the data/ directory or symlink the data/ directory itself. You may even choose not to have a data/ directory at all. The only requirement is that each region has its own directory of drone image files.

Inside the data/ directory, we created a samples.tsv file describing the paths to these datasets:

region1    data/region1
region2    data/region2
region3    data/region3    .PNG

The samples.tsv file has three tab-separated columns and a line for each dataset that you'd like to analyze. The first column is a unique identifier you assign to the dataset. This is used by the pipeline when it creates its output. Note that it is best to avoid using spaces in your unique identifiers. The second column is the path to the dataset from the root of the project directory.

The third column is optional and denotes the extension of the image files in the dataset's directory. If this is not specified, the most commonly used extension will be used. In our example, the pipeline would default to using .JPG for region3, since data/region3 has only one .PNG file. But by specifying .PNG in our samples.tsv file, we are instructing the pipeline to use only the .PNG file in data/region3.

Once you're done constructing your samples.tsv file, you should specify the path to it in your config.yml configuration file.

It is best to specify all of your datasets in samples.tsv even if you only plan to use a few of them at first. A separate configuration option in config.yml called SAMP_NAMES allows you to use only a subset of the datasets at once.

Clone this wiki locally