Skip to content

Commit

Permalink
Merge pull request #44 from lightly-ai/update_docker_documentation_pw
Browse files Browse the repository at this point in the history
Update docker documentation for videos
  • Loading branch information
philippmwirth authored Nov 20, 2020
2 parents 2b7ad18 + efa4832 commit e569220
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 0 deletions.
3 changes: 3 additions & 0 deletions docs/source/docker/configuration/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ The following are parameters which can be passed to the container:
# remove exact duplicates
remove_exact_duplicates: True
# dump the final dataset to the output directory
dump_dataset: False
# pass checkpoint
checkpoint: ''
Expand Down
50 changes: 50 additions & 0 deletions docs/source/docker/getting_started/first_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,56 @@ move the embeddings file to the shared directory, and specify the filename like
stopping_condition.n_samples=0.3 \
embeddings=my_embeddings.csv
Sampling from Video Files
--------------------------
In case you are working with video files, it is possible to point the docker container
directly to the video files. This prevents the need to extract the individual frames beforehand.
To do so, simply store all videos you want to work with in a single directory, the lightly software
will automatically load all frames from the videos.

.. code-block:: console
# work on a single video
data/
+-- my_video.mp4
# work on several videos
data/
+-- my_video_1.mp4
+-- my_video_2.avi
As you can see, the videos do not need to be in the same file format. An example command for a folder
structure as shown above could then look like this:

.. code-block:: console
docker run --gpus all --rm -it \
-v INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir:ro \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
stopping_condition.n_samples=0.3
Where INPUT_DIR is the path to the directory containing the video files.

Removing Exact Duplicates
---------------------------
With the docker solution, it is possible to remove **only exact duplicates** from the dataset. For this,
simply set the stopping condition `n_samples` to 1.0 (which translates to 100% of the data). The exact command is:

.. code-block:: console
docker run --gpus all --rm -it \
-v INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir:ro \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
remove_exact_duplicates=True \
stopping_condition.n_samples=1.
Reporting
-----------------------------------

Expand Down

0 comments on commit e569220

Please sign in to comment.