Skip to content

Commit

Permalink
Merge pull request #45 from lightly-ai/develop
Browse files Browse the repository at this point in the history
Pre-release 1.0.4 - Develop to Master
  • Loading branch information
philippmwirth authored Nov 20, 2020
2 parents 2e0883a + e569220 commit 3d7cdd0
Show file tree
Hide file tree
Showing 25 changed files with 1,059 additions and 274 deletions.
3 changes: 3 additions & 0 deletions docs/source/docker/configuration/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ The following are parameters which can be passed to the container:
# remove exact duplicates
remove_exact_duplicates: True
# dump the final dataset to the output directory
dump_dataset: False
# pass checkpoint
checkpoint: ''
Expand Down
50 changes: 50 additions & 0 deletions docs/source/docker/getting_started/first_steps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,56 @@ move the embeddings file to the shared directory, and specify the filename like
stopping_condition.n_samples=0.3 \
embeddings=my_embeddings.csv
Sampling from Video Files
--------------------------
In case you are working with video files, it is possible to point the docker container
directly to the video files. This prevents the need to extract the individual frames beforehand.
To do so, simply store all videos you want to work with in a single directory, the lightly software
will automatically load all frames from the videos.

.. code-block:: console
# work on a single video
data/
+-- my_video.mp4
# work on several videos
data/
+-- my_video_1.mp4
+-- my_video_2.avi
As you can see, the videos do not need to be in the same file format. An example command for a folder
structure as shown above could then look like this:

.. code-block:: console
docker run --gpus all --rm -it \
-v INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir:ro \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
stopping_condition.n_samples=0.3
Where INPUT_DIR is the path to the directory containing the video files.

Removing Exact Duplicates
---------------------------
With the docker solution, it is possible to remove **only exact duplicates** from the dataset. For this,
simply set the stopping condition `n_samples` to 1.0 (which translates to 100% of the data). The exact command is:

.. code-block:: console
docker run --gpus all --rm -it \
-v INPUT_DIR:/home/input_dir:ro \
-v SHARED_DIR:/home/shared_dir:ro \
-v OUTPUT_DIR:/home/output_dir \
lightly/sampling:latest \
token=MYAWESOMETOKEN \
remove_exact_duplicates=True \
stopping_condition.n_samples=1.
Reporting
-----------------------------------

Expand Down
33 changes: 33 additions & 0 deletions docs/source/tutorials/structure_your_input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,39 @@ For the structure above, lightly will understand the input as follows:
10,
]
Video Folder Datasets
---------------------
The lightly Python package allows you to work `directly` on video data, without having
to exctract the frames first. This can save a lot of disc space as video files are
typically strongly compressed. Using lightly on video data is as simple as pointing
the software at an input directory where one or more videos are stored. The package will
automatically detect all video files and index them so that each frame can be accessed.

An example for an input directory with videos could look like this:

.. code-block:: bash
data/
+-- my_video_1.mov
+-- my_video_2.mp4
+-- my_video_3.avi
The example also shows the currently supported video file formats (.mov, .mp4, and .avi).
To upload the three videos from above to the platform, you can use

.. code-block:: bash
lightly-upload token='123' dataset_id='XYZ' input_dir='data/'
All other operations (like training a self-supervised model and embedding the frames individually)
also work on video data. Give it a try!

.. note::

Randomly accessing video frames is slower compared to accessing the extracted frames on disc. However,
by working directly on video files, one can save a lot of disc space because the frames do not have to
be exctracted beforehand.

Torchvision Datasets
--------------------

Expand Down
3 changes: 1 addition & 2 deletions lightly/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# All Rights Reserved

__name__ = 'lightly'
__version__ = '1.0.3'
__version__ = '1.0.4'


try:
Expand Down Expand Up @@ -44,7 +44,6 @@
def is_prefetch_generator_available():
return _prefetch_generator_available


# import core functionalities
from lightly.core import train_model_and_embed_images
from lightly.core import train_embedding_model
Expand Down
15 changes: 8 additions & 7 deletions lightly/api/routes/users/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,16 @@ def get_quota(token: str):
A token to identify the user.
Returns:
A dictionary with the quota for the user.
The quota for the user and the status code of the response.
"""
dst_url = _prefix()
dst_url = _prefix() + '/quota'
payload = {
'token': token
}

try:
response = requests.get(dst_url, params=payload)
return response.json()
except Exception:
return {'maxDatasetSize': LIGHTLY_MAXIMUM_DATASET_SIZE}
response = requests.get(dst_url, params=payload)
status_code = response.status_code
if status_code == 200:
return response.json()['maxDatasetSize'], status_code
else:
return LIGHTLY_MAXIMUM_DATASET_SIZE, status_code
Loading

0 comments on commit 3d7cdd0

Please sign in to comment.