Rework docs (#830)

* Philipp clean up docs archive old docs (#819) * Make changes to makefile and index * Archive docker docs * Rework getting started section (#820) * Rework getting started section * Rename docker to worker * Add support for setting dataset type (#822) * Philipp lig 1200 clean up docs first steps (#821) * Rework getting started section * Streamline workflows * Use api workflow client to create dataset * Update FAQ (#824) * Igor lig 1201 clean up docs advanced (#823) * Rearrange docs based on new setup * Update datapool * Update datapool example * Disable pretagging in datapool example * Update pretagging * Use Lightly Worker instead of docker * Update active learning * Implement feedback * Philipp lig 1204 clean up docs examples (#825) * Rework datasets in the wild * Rework examples overview * Rework academic datsets * Igor lig 1203 clean up docs configuration (#826) * Update configuration * Make sure we use Lightly Worker * Implement feedback * Philipp lig 1202 clean up docs integration (#827) * Rework dagster tutorial * Drop the other integration parts * Add default thumbnail suffix (#828) * Igor lig 1231 finish cleanup docs (#829) * Align docs and add missing changes * Rename to register worker * Remove input volume mapping * Update overview section * Make sure we use Lightly Worker * Move bracket to new line Co-authored-by: philippmwirth <philipp.m.wirth@gmail.com> Co-authored-by: Philipp Wirth <65946090+philippmwirth@users.noreply.github.com> Co-authored-by: philippmwirth <philipp.m.wirth@gmail.com>
lightly-ai · Jun 9, 2022 · a64edfc · a64edfc
1 parent 806ba5a
commit a64edfc
Show file tree

Hide file tree

Showing 80 changed files with 5,889 additions and 1,112 deletions.
diff --git a/docs/Makefile b/docs/Makefile
@@ -11,6 +11,7 @@ DATADIR		   			= _data
 PACKAGESOURCE  			= source/tutorials_source/package
 PLATFORMSOURCE 			= source/tutorials_source/platform
 DOCKERSOURCE   			= source/docker
+DOCKER_ARCHIVE_SOURCE   = source/docker_archive
 GETTING_STARTED_IMAGES 	= source/getting_started/resources
 
 
@@ -57,6 +58,7 @@ download-noplot:
 	# download images and report for docker
 	wget -N https://storage.googleapis.com/datasets_boris/resources.zip -P $(DATADIR);\
 	unzip $(ZIPOPTS) $(DATADIR)/resources.zip  -d $(DOCKERSOURCE);\
+	unzip $(ZIPOPTS) $(DATADIR)/resources.zip  -d $(DOCKER_ARCHIVE_SOURCE); \
 
 	# pizza dataset
 	@if [ ! -d $(PLATFORMSOURCE)/pizzas/salami ]; then \

diff --git a/docs/source/docker/advanced/active_learning.rst b/docs/source/docker/advanced/active_learning.rst
@@ -19,7 +19,7 @@ Prerequisites
 --------------
 In order to do active learning with Lightly, you will need the following things:
 
-- The installed Lightly docker (see :ref:`ref-docker-setup`)
+- The installed Lightly Worker (see :ref:`ref-docker-setup`)
 - A dataset with a configured datasource (see :ref:`ref-docker-with-datasource-datapool`)
 - Your predictions uploaded to the datasource (see :ref:`ref-docker-datasource-predictions`)
 
@@ -33,47 +33,26 @@ In order to do active learning with Lightly, you will need the following things:
 Selection
 -------------------------
 Once you have everything set up as described above, you can do an active learning
-iteration by specifying the following three things in your Lightly docker config:
+iteration by specifying the following three things in your Lightly Worker config:
 
 - `method`
 - `active_learning.task_name`
 - `active_learning.score_name`
 
 Here's an example of how to configure an active learning run:
 
+.. literalinclude:: code_examples/python_run_active_learning.py
 
-.. tabs::
 
-    .. tab:: Web App
+After running the code we have to make sure we have a running Lightly Worker 
+to process the job.
+We can start a Lightly Worker using the following command
 
-        **Trigger the Job**
-
-        To trigger a new job you can click on the schedule run button on the dataset
-        overview as shown in the screenshot below:
-
-        .. figure:: ../integration/images/schedule-compute-run.png
-
-        After clicking on the button you will see a wizard to configure the parameters
-        for the job.
-
-        .. figure:: ../integration/images/schedule-compute-run-config.png
-
-        In this example we have to set the `active_learning.task_name` parameter
-        in the docker config. Additionally, we set the `method` to `coral` which
-        simultaneously considers the diversity and the active learning scores of
-        the samples. All other settings are default values. The
-        resulting docker config should look like this:
-
-        .. literalinclude:: code_examples/active_learning_worker_config.txt
-            :caption: Docker Config
-            :language: javascript
-
-        The Lightly config remains unchanged.
-
-    .. tab:: Python Code
-
-        .. literalinclude:: code_examples/python_run_active_learning.py
+.. code-block:: console
 
+  docker run --rm --gpus all -it \
+    -v /docker-output:/home/output_dir lightly/worker:latest \
+    token=YOUR_TOKEN  worker.worker_id=YOUR_WORKER_ID
 
 After the worker has finished its job you can see the selected images with their
 active learning score in the web-app.
@@ -86,12 +65,12 @@ Active Learning with Custom Scores (not recommended as of March 2022)
     This is not recommended anymore as of March 2022 and will be deprecated in the future!
 
 
-For running an active learning step with the Lightly docker, we need to perform
+For running an active learning step with the Lightly Worker, we need to perform
 3 steps:
 
-1. Create an `embeddings.csv` file. You can use your own models or the Lightly docker for this.
+1. Create an `embeddings.csv` file. You can use your own models or the Lightly Worker for this.
 2. Add your active learning scores as an additional column to the embeddings file.
-3. Use the Lightly docker to perform an active learning iteration on the scores.
+3. Use the Lightly Worker to perform an active learning iteration on the scores.
 
 Learn more about the concept of active learning 
 :ref:`lightly-active-learning-scorers`.

diff --git a/docs/source/docker/advanced/code_examples/python_run_active_learning.py b/docs/source/docker/advanced/code_examples/python_run_active_learning.py
@@ -1,7 +1,67 @@
+import json
 import lightly
+from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
+from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose
 
 # Create the Lightly client to connect to the API.
-client = lightly.api.ApiWorkflowClient(token="TOKEN", dataset_id="DATASET_ID")
+client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")
+
+# Create a new dataset on the Lightly Platform.
+client.create_dataset('pedestrian-videos-datapool',
+                      dataset_type=DatasetType.VIDEOS)
+
+# Pick one of the following three blocks depending on where your data is
+# AWS S3
+# Input bucket
+client.set_s3_config(
+    resource_path="s3://bucket/input/",
+    region='eu-central-1',
+    access_key='S3-ACCESS-KEY',
+    secret_access_key='S3-SECRET-ACCESS-KEY',
+    purpose=DatasourcePurpose.INPUT
+)
+# Output bucket
+client.set_s3_config(
+    resource_path="s3://bucket/output/",
+    region='eu-central-1',
+    access_key='S3-ACCESS-KEY',
+    secret_access_key='S3-SECRET-ACCESS-KEY',
+    purpose=DatasourcePurpose.LIGHTLY
+)
+
+
+# or Google Cloud Storage
+# Input bucket
+client.set_gcs_config(
+    resource_path="gs://bucket/input/",
+    project_id="PROJECT-ID",
+    credentials=json.dumps(json.load(open('credentials_read.json'))),
+    purpose=DatasourcePurpose.INPUT
+)
+# Output bucket
+client.set_gcs_config(
+    resource_path="gs://bucket/output/",
+    project_id="PROJECT-ID",
+    credentials=json.dumps(json.load(open('credentials_write.json'))),
+    purpose=DatasourcePurpose.LIGHTLY
+)
+
+
+# or Azure Blob Storage
+# Input bucket
+client.set_azure_config(
+    container_name='my-container/input/',
+    account_name='ACCOUNT-NAME',
+    sas_token='SAS-TOKEN',
+    purpose=DatasourcePurpose.INPUT
+)
+# Output bucket
+client.set_azure_config(
+    container_name='my-container/output/',
+    account_name='ACCOUNT-NAME',
+    sas_token='SAS-TOKEN',
+    purpose=DatasourcePurpose.LIGHTLY
+)
 
 # Schedule the docker run with 
 #  - "active_learning.task_name" set to your task name
@@ -15,19 +75,14 @@
         "enable_training": False,
         "pretagging": False,
         "pretagging_debug": False,
-        "method": "coral",
+        "method": "coral",  # we use the coral method here
         "stopping_condition": {
           "n_samples": 0.1,
           "min_distance": -1
         },
-        "scorer": "object-frequency",
-        "scorer_config": {
-          "frequency_penalty": 0.25,
-          "min_score": 0.9
-        },
-        "active_learning": {
-          "task_name": "my-classification-task",
-          "score_name": "uncertainty_margin"
+        "active_learning": { # here we specify our active learning parameters
+          "task_name": "my-classification-task",    # set the task
+          "score_name": "uncertainty_margin"        # set the score
         }
     },
     lightly_config={
@@ -71,4 +126,4 @@
             'rr_prob': 0
         }
     }
-)
+)
diff --git a/docs/source/docker/advanced/code_examples/python_run_datapool_example.py b/docs/source/docker/advanced/code_examples/python_run_datapool_example.py
@@ -0,0 +1,125 @@
+import json
+import lightly
+from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
+from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose
+
+# Create the Lightly client to connect to the API.
+client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")
+
+# Create a new dataset on the Lightly Platform.
+client.create_dataset('pedestrian-videos-datapool',
+                      dataset_type=DatasetType.VIDEOS)
+
+# Pick one of the following three blocks depending on where your data is
+# AWS S3
+# Input bucket
+client.set_s3_config(
+    resource_path="s3://bucket/input/",
+    region='eu-central-1',
+    access_key='S3-ACCESS-KEY',
+    secret_access_key='S3-SECRET-ACCESS-KEY',
+    purpose=DatasourcePurpose.INPUT
+)
+# Output bucket
+client.set_s3_config(
+    resource_path="s3://bucket/output/",
+    region='eu-central-1',
+    access_key='S3-ACCESS-KEY',
+    secret_access_key='S3-SECRET-ACCESS-KEY',
+    purpose=DatasourcePurpose.LIGHTLY
+)
+
+
+# or Google Cloud Storage
+# Input bucket
+client.set_gcs_config(
+    resource_path="gs://bucket/input/",
+    project_id="PROJECT-ID",
+    credentials=json.dumps(json.load(open('credentials_read.json'))),
+    purpose=DatasourcePurpose.INPUT
+)
+# Output bucket
+client.set_gcs_config(
+    resource_path="gs://bucket/output/",
+    project_id="PROJECT-ID",
+    credentials=json.dumps(json.load(open('credentials_write.json'))),
+    purpose=DatasourcePurpose.LIGHTLY
+)
+
+
+# or Azure Blob Storage
+# Input bucket
+client.set_azure_config(
+    container_name='my-container/input/',
+    account_name='ACCOUNT-NAME',
+    sas_token='SAS-TOKEN',
+    purpose=DatasourcePurpose.INPUT
+)
+# Output bucket
+client.set_azure_config(
+    container_name='my-container/output/',
+    account_name='ACCOUNT-NAME',
+    sas_token='SAS-TOKEN',
+    purpose=DatasourcePurpose.LIGHTLY
+)
+
+
+# Schedule the compute run using our custom config.
+# We show here the full default config so you can easily edit the
+# values according to your needs.
+client.schedule_compute_worker_run(
+    worker_config={
+        'enable_corruptness_check': True,
+        'remove_exact_duplicates': True,
+        'enable_training': False,
+        'pretagging': False,
+        'pretagging_debug': False,
+        'method': 'coreset',
+        'stopping_condition': {
+            'n_samples': -1,
+            'min_distance': 0.05 # we set the min_distance to 0.05 in this example
+        }
+    },
+    lightly_config={
+        'loader': {
+            'batch_size': 128,
+            'shuffle': True,
+            'num_workers': -1,
+            'drop_last': True
+        },
+        'model': {
+            'name': 'resnet-18',
+            'out_dim': 128,
+            'num_ftrs': 32,
+            'width': 1
+        },
+        'trainer': {
+            'gpus': 1,
+            'max_epochs': 1,
+            'precision': 16
+        },
+        'criterion': {
+            'temperature': 0.5
+        },
+        'optimizer': {
+            'lr': 1,
+            'weight_decay': 0.00001
+        },
+        'collate': {
+            'input_size': 64,
+            'cj_prob': 0.8,
+            'cj_bright': 0.7,
+            'cj_contrast': 0.7,
+            'cj_sat': 0.7,
+            'cj_hue': 0.2,
+            'min_scale': 0.15,
+            'random_gray_scale': 0.2,
+            'gaussian_blur': 0.0,
+            'kernel_size': 0.1,
+            'vf_prob': 0,
+            'hf_prob': 0.5,
+            'rr_prob': 0
+        }
+    }
+)
+