Skip to content

Commit

Permalink
Rework docs (#830)
Browse files Browse the repository at this point in the history
* Philipp clean up docs archive old docs (#819)

* Make changes to makefile and index

* Archive docker docs

* Rework getting started section (#820)

* Rework getting started section

* Rename docker to worker

* Add support for setting dataset type (#822)

* Philipp lig 1200 clean up docs first steps (#821)

* Rework getting started section

* Streamline workflows

* Use api workflow client to create dataset

* Update FAQ (#824)

* Igor lig 1201 clean up docs advanced (#823)

* Rearrange docs based on new setup

* Update datapool

* Update datapool example

* Disable pretagging in datapool example

* Update pretagging

* Use Lightly Worker instead of docker

* Update active learning

* Implement feedback

* Philipp lig 1204 clean up docs examples (#825)

* Rework datasets in the wild

* Rework examples overview

* Rework academic datsets

* Igor lig 1203 clean up docs configuration (#826)

* Update configuration

* Make sure we use Lightly Worker

* Implement feedback

* Philipp lig 1202 clean up docs integration (#827)

* Rework dagster tutorial

* Drop the other integration parts

* Add default thumbnail suffix (#828)

* Igor lig 1231 finish cleanup docs (#829)

* Align docs and add missing changes

* Rename to register worker

* Remove input volume mapping

* Update overview section

* Make sure we use Lightly Worker

* Move bracket to new line

Co-authored-by: philippmwirth <philipp.m.wirth@gmail.com>

Co-authored-by: Philipp Wirth <65946090+philippmwirth@users.noreply.github.com>
Co-authored-by: philippmwirth <philipp.m.wirth@gmail.com>
  • Loading branch information
3 people authored Jun 9, 2022
1 parent 806ba5a commit a64edfc
Show file tree
Hide file tree
Showing 80 changed files with 5,889 additions and 1,112 deletions.
2 changes: 2 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ DATADIR = _data
PACKAGESOURCE = source/tutorials_source/package
PLATFORMSOURCE = source/tutorials_source/platform
DOCKERSOURCE = source/docker
DOCKER_ARCHIVE_SOURCE = source/docker_archive
GETTING_STARTED_IMAGES = source/getting_started/resources


Expand Down Expand Up @@ -57,6 +58,7 @@ download-noplot:
# download images and report for docker
wget -N https://storage.googleapis.com/datasets_boris/resources.zip -P $(DATADIR);\
unzip $(ZIPOPTS) $(DATADIR)/resources.zip -d $(DOCKERSOURCE);\
unzip $(ZIPOPTS) $(DATADIR)/resources.zip -d $(DOCKER_ARCHIVE_SOURCE); \

# pizza dataset
@if [ ! -d $(PLATFORMSOURCE)/pizzas/salami ]; then \
Expand Down
47 changes: 13 additions & 34 deletions docs/source/docker/advanced/active_learning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Prerequisites
--------------
In order to do active learning with Lightly, you will need the following things:

- The installed Lightly docker (see :ref:`ref-docker-setup`)
- The installed Lightly Worker (see :ref:`ref-docker-setup`)
- A dataset with a configured datasource (see :ref:`ref-docker-with-datasource-datapool`)
- Your predictions uploaded to the datasource (see :ref:`ref-docker-datasource-predictions`)

Expand All @@ -33,47 +33,26 @@ In order to do active learning with Lightly, you will need the following things:
Selection
-------------------------
Once you have everything set up as described above, you can do an active learning
iteration by specifying the following three things in your Lightly docker config:
iteration by specifying the following three things in your Lightly Worker config:

- `method`
- `active_learning.task_name`
- `active_learning.score_name`

Here's an example of how to configure an active learning run:

.. literalinclude:: code_examples/python_run_active_learning.py

.. tabs::

.. tab:: Web App
After running the code we have to make sure we have a running Lightly Worker
to process the job.
We can start a Lightly Worker using the following command

**Trigger the Job**

To trigger a new job you can click on the schedule run button on the dataset
overview as shown in the screenshot below:

.. figure:: ../integration/images/schedule-compute-run.png

After clicking on the button you will see a wizard to configure the parameters
for the job.

.. figure:: ../integration/images/schedule-compute-run-config.png

In this example we have to set the `active_learning.task_name` parameter
in the docker config. Additionally, we set the `method` to `coral` which
simultaneously considers the diversity and the active learning scores of
the samples. All other settings are default values. The
resulting docker config should look like this:

.. literalinclude:: code_examples/active_learning_worker_config.txt
:caption: Docker Config
:language: javascript

The Lightly config remains unchanged.

.. tab:: Python Code

.. literalinclude:: code_examples/python_run_active_learning.py
.. code-block:: console
docker run --rm --gpus all -it \
-v /docker-output:/home/output_dir lightly/worker:latest \
token=YOUR_TOKEN worker.worker_id=YOUR_WORKER_ID
After the worker has finished its job you can see the selected images with their
active learning score in the web-app.
Expand All @@ -86,12 +65,12 @@ Active Learning with Custom Scores (not recommended as of March 2022)
This is not recommended anymore as of March 2022 and will be deprecated in the future!


For running an active learning step with the Lightly docker, we need to perform
For running an active learning step with the Lightly Worker, we need to perform
3 steps:

1. Create an `embeddings.csv` file. You can use your own models or the Lightly docker for this.
1. Create an `embeddings.csv` file. You can use your own models or the Lightly Worker for this.
2. Add your active learning scores as an additional column to the embeddings file.
3. Use the Lightly docker to perform an active learning iteration on the scores.
3. Use the Lightly Worker to perform an active learning iteration on the scores.

Learn more about the concept of active learning
:ref:`lightly-active-learning-scorers`.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,67 @@
import json
import lightly
from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="TOKEN", dataset_id="DATASET_ID")
client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset('pedestrian-videos-datapool',
dataset_type=DatasetType.VIDEOS)

# Pick one of the following three blocks depending on where your data is
# AWS S3
# Input bucket
client.set_s3_config(
resource_path="s3://bucket/input/",
region='eu-central-1',
access_key='S3-ACCESS-KEY',
secret_access_key='S3-SECRET-ACCESS-KEY',
purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_s3_config(
resource_path="s3://bucket/output/",
region='eu-central-1',
access_key='S3-ACCESS-KEY',
secret_access_key='S3-SECRET-ACCESS-KEY',
purpose=DatasourcePurpose.LIGHTLY
)


# or Google Cloud Storage
# Input bucket
client.set_gcs_config(
resource_path="gs://bucket/input/",
project_id="PROJECT-ID",
credentials=json.dumps(json.load(open('credentials_read.json'))),
purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_gcs_config(
resource_path="gs://bucket/output/",
project_id="PROJECT-ID",
credentials=json.dumps(json.load(open('credentials_write.json'))),
purpose=DatasourcePurpose.LIGHTLY
)


# or Azure Blob Storage
# Input bucket
client.set_azure_config(
container_name='my-container/input/',
account_name='ACCOUNT-NAME',
sas_token='SAS-TOKEN',
purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_azure_config(
container_name='my-container/output/',
account_name='ACCOUNT-NAME',
sas_token='SAS-TOKEN',
purpose=DatasourcePurpose.LIGHTLY
)

# Schedule the docker run with
# - "active_learning.task_name" set to your task name
Expand All @@ -15,19 +75,14 @@
"enable_training": False,
"pretagging": False,
"pretagging_debug": False,
"method": "coral",
"method": "coral", # we use the coral method here
"stopping_condition": {
"n_samples": 0.1,
"min_distance": -1
},
"scorer": "object-frequency",
"scorer_config": {
"frequency_penalty": 0.25,
"min_score": 0.9
},
"active_learning": {
"task_name": "my-classification-task",
"score_name": "uncertainty_margin"
"active_learning": { # here we specify our active learning parameters
"task_name": "my-classification-task", # set the task
"score_name": "uncertainty_margin" # set the score
}
},
lightly_config={
Expand Down Expand Up @@ -71,4 +126,4 @@
'rr_prob': 0
}
}
)
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
import json
import lightly
from lightly.openapi_generated.swagger_client.models.dataset_type import DatasetType
from lightly.openapi_generated.swagger_client.models.datasource_purpose import DatasourcePurpose

# Create the Lightly client to connect to the API.
client = lightly.api.ApiWorkflowClient(token="YOUR_TOKEN")

# Create a new dataset on the Lightly Platform.
client.create_dataset('pedestrian-videos-datapool',
dataset_type=DatasetType.VIDEOS)

# Pick one of the following three blocks depending on where your data is
# AWS S3
# Input bucket
client.set_s3_config(
resource_path="s3://bucket/input/",
region='eu-central-1',
access_key='S3-ACCESS-KEY',
secret_access_key='S3-SECRET-ACCESS-KEY',
purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_s3_config(
resource_path="s3://bucket/output/",
region='eu-central-1',
access_key='S3-ACCESS-KEY',
secret_access_key='S3-SECRET-ACCESS-KEY',
purpose=DatasourcePurpose.LIGHTLY
)


# or Google Cloud Storage
# Input bucket
client.set_gcs_config(
resource_path="gs://bucket/input/",
project_id="PROJECT-ID",
credentials=json.dumps(json.load(open('credentials_read.json'))),
purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_gcs_config(
resource_path="gs://bucket/output/",
project_id="PROJECT-ID",
credentials=json.dumps(json.load(open('credentials_write.json'))),
purpose=DatasourcePurpose.LIGHTLY
)


# or Azure Blob Storage
# Input bucket
client.set_azure_config(
container_name='my-container/input/',
account_name='ACCOUNT-NAME',
sas_token='SAS-TOKEN',
purpose=DatasourcePurpose.INPUT
)
# Output bucket
client.set_azure_config(
container_name='my-container/output/',
account_name='ACCOUNT-NAME',
sas_token='SAS-TOKEN',
purpose=DatasourcePurpose.LIGHTLY
)


# Schedule the compute run using our custom config.
# We show here the full default config so you can easily edit the
# values according to your needs.
client.schedule_compute_worker_run(
worker_config={
'enable_corruptness_check': True,
'remove_exact_duplicates': True,
'enable_training': False,
'pretagging': False,
'pretagging_debug': False,
'method': 'coreset',
'stopping_condition': {
'n_samples': -1,
'min_distance': 0.05 # we set the min_distance to 0.05 in this example
}
},
lightly_config={
'loader': {
'batch_size': 128,
'shuffle': True,
'num_workers': -1,
'drop_last': True
},
'model': {
'name': 'resnet-18',
'out_dim': 128,
'num_ftrs': 32,
'width': 1
},
'trainer': {
'gpus': 1,
'max_epochs': 1,
'precision': 16
},
'criterion': {
'temperature': 0.5
},
'optimizer': {
'lr': 1,
'weight_decay': 0.00001
},
'collate': {
'input_size': 64,
'cj_prob': 0.8,
'cj_bright': 0.7,
'cj_contrast': 0.7,
'cj_sat': 0.7,
'cj_hue': 0.2,
'min_scale': 0.15,
'random_gray_scale': 0.2,
'gaussian_blur': 0.0,
'kernel_size': 0.1,
'vf_prob': 0,
'hf_prob': 0.5,
'rr_prob': 0
}
}
)

Loading

0 comments on commit a64edfc

Please sign in to comment.