Merge pull request #45 from lightly-ai/develop

Pre-release 1.0.4 - Develop to Master
lightly-ai · Nov 20, 2020 · 3d7cdd0 · 3d7cdd0
2 parents 2e0883a + e569220
commit 3d7cdd0
Show file tree

Hide file tree

Showing 25 changed files with 1,059 additions and 274 deletions.
diff --git a/docs/source/docker/configuration/configuration.rst b/docs/source/docker/configuration/configuration.rst
@@ -34,6 +34,9 @@ The following are parameters which can be passed to the container:
     # remove exact duplicates
     remove_exact_duplicates: True
 
+    # dump the final dataset to the output directory
+    dump_dataset: False
+
     # pass checkpoint
     checkpoint: ''
 

diff --git a/docs/source/docker/getting_started/first_steps.rst b/docs/source/docker/getting_started/first_steps.rst
@@ -185,6 +185,56 @@ move the embeddings file to the shared directory, and specify the filename like
         stopping_condition.n_samples=0.3 \
         embeddings=my_embeddings.csv
 
+Sampling from Video Files
+--------------------------
+In case you are working with video files, it is possible to point the docker container 
+directly to the video files. This prevents the need to extract the individual frames beforehand.
+To do so, simply store all videos you want to work with in a single directory, the lightly software
+will automatically load all frames from the videos.
+
+.. code-block:: console
+
+    # work on a single video
+    data/
+    +-- my_video.mp4
+
+    # work on several videos
+    data/
+    +-- my_video_1.mp4
+    +-- my_video_2.avi
+
+As you can see, the videos do not need to be in the same file format. An example command for a folder 
+structure as shown above could then look like this:
+
+.. code-block:: console
+
+    docker run --gpus all --rm -it \
+        -v INPUT_DIR:/home/input_dir:ro \
+        -v SHARED_DIR:/home/shared_dir:ro \
+        -v OUTPUT_DIR:/home/output_dir \
+        lightly/sampling:latest \
+        token=MYAWESOMETOKEN \
+        stopping_condition.n_samples=0.3
+
+Where INPUT_DIR is the path to the directory containing the video files.
+
+Removing Exact Duplicates
+---------------------------
+With the docker solution, it is possible to remove **only exact duplicates** from the dataset. For this,
+simply set the stopping condition `n_samples` to 1.0 (which translates to 100% of the data). The exact command is:
+
+.. code-block:: console
+
+    docker run --gpus all --rm -it \
+        -v INPUT_DIR:/home/input_dir:ro \
+        -v SHARED_DIR:/home/shared_dir:ro \
+        -v OUTPUT_DIR:/home/output_dir \
+        lightly/sampling:latest \
+        token=MYAWESOMETOKEN \
+        remove_exact_duplicates=True \
+        stopping_condition.n_samples=1.
+
+
 Reporting
 -----------------------------------
 

diff --git a/docs/source/tutorials/structure_your_input.rst b/docs/source/tutorials/structure_your_input.rst
@@ -106,6 +106,39 @@ For the structure above, lightly will understand the input as follows:
         10,
     ]
 
+Video Folder Datasets
+---------------------
+The lightly Python package allows you to work `directly` on video data, without having
+to exctract the frames first. This can save a lot of disc space as video files are
+typically strongly compressed. Using lightly on video data is as simple as pointing 
+the software at an input directory where one or more videos are stored. The package will
+automatically detect all video files and index them so that each frame can be accessed.
+
+An example for an input directory with videos could look like this:
+
+.. code-block:: bash
+
+    data/
+    +-- my_video_1.mov
+    +-- my_video_2.mp4
+    +-- my_video_3.avi
+
+The example also shows the currently supported video file formats (.mov, .mp4, and .avi).
+To upload the three videos from above to the platform, you can use
+
+.. code-block:: bash
+
+    lightly-upload token='123' dataset_id='XYZ' input_dir='data/'
+
+All other operations (like training a self-supervised model and embedding the frames individually)
+also work on video data. Give it a try! 
+
+.. note::
+
+    Randomly accessing video frames is slower compared to accessing the extracted frames on disc. However,
+    by working directly on video files, one can save a lot of disc space because the frames do not have to 
+    be exctracted beforehand.
+
 Torchvision Datasets
 --------------------
 

diff --git a/lightly/__init__.py b/lightly/__init__.py
@@ -15,7 +15,7 @@
 # All Rights Reserved
 
 __name__ = 'lightly'
-__version__ = '1.0.3'
+__version__ = '1.0.4'
 
 
 try:
@@ -44,7 +44,6 @@
     def is_prefetch_generator_available():
         return _prefetch_generator_available
 
-
     # import core functionalities
     from lightly.core import train_model_and_embed_images
     from lightly.core import train_embedding_model

diff --git a/lightly/api/routes/users/service.py b/lightly/api/routes/users/service.py
@@ -32,15 +32,16 @@ def get_quota(token: str):
             A token to identify the user.
 
     Returns:
-        A dictionary with the quota for the user.
+        The quota for the user and the status code of the response.
     """
-    dst_url = _prefix()
+    dst_url = _prefix() + '/quota'
     payload = {
         'token': token
     }
 
-    try:
-        response = requests.get(dst_url, params=payload)
-        return response.json()
-    except Exception:
-        return {'maxDatasetSize': LIGHTLY_MAXIMUM_DATASET_SIZE}
+    response = requests.get(dst_url, params=payload)
+    status_code = response.status_code
+    if status_code == 200:
+        return response.json()['maxDatasetSize'], status_code
+    else:
+        return LIGHTLY_MAXIMUM_DATASET_SIZE, status_code