Skip to content

Commit

Permalink
docs + config cleanup (closes #128)
Browse files Browse the repository at this point in the history
* fiftyone: get rid of "fast" dataset creation option (always include confidence)

* fiftyone: remove unused video plotting functionality

* pca: remove some options from configs to clean up

* fiftyone: update colab notebook

* [docs] data config and nan uniform heatmaps

* flake + isort + tests

* PR updates

* make dali matrix an array rather than a scalar
  • Loading branch information
themattinthehatt authored Feb 18, 2024
1 parent c9fde49 commit 38276d1
Show file tree
Hide file tree
Showing 22 changed files with 192 additions and 412 deletions.
17 changes: 0 additions & 17 deletions docs/api/lightning_pose.utils.fiftyone.FiftyOneFactory.rst

This file was deleted.

20 changes: 20 additions & 0 deletions docs/api/lightning_pose.utils.fiftyone.FiftyOneImagePlotter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,39 @@ FiftyOneImagePlotter
.. autosummary::

~FiftyOneImagePlotter.image_paths
~FiftyOneImagePlotter.img_height
~FiftyOneImagePlotter.img_width
~FiftyOneImagePlotter.model_names
~FiftyOneImagePlotter.num_keypoints

.. rubric:: Methods Summary

.. autosummary::

~FiftyOneImagePlotter.build_single_frame_keypoints
~FiftyOneImagePlotter.create_dataset
~FiftyOneImagePlotter.dataset_info_print
~FiftyOneImagePlotter.get_gt_keypoints_list
~FiftyOneImagePlotter.get_keypoints_per_image
~FiftyOneImagePlotter.get_model_abs_paths
~FiftyOneImagePlotter.get_pred_keypoints_dict
~FiftyOneImagePlotter.load_model_predictions

.. rubric:: Attributes Documentation

.. autoattribute:: image_paths
.. autoattribute:: img_height
.. autoattribute:: img_width
.. autoattribute:: model_names
.. autoattribute:: num_keypoints

.. rubric:: Methods Documentation

.. automethod:: build_single_frame_keypoints
.. automethod:: create_dataset
.. automethod:: dataset_info_print
.. automethod:: get_gt_keypoints_list
.. automethod:: get_keypoints_per_image
.. automethod:: get_model_abs_paths
.. automethod:: get_pred_keypoints_dict
.. automethod:: load_model_predictions
47 changes: 0 additions & 47 deletions docs/api/lightning_pose.utils.fiftyone.FiftyOneKeypointBase.rst

This file was deleted.

This file was deleted.

6 changes: 0 additions & 6 deletions docs/api/lightning_pose.utils.fiftyone.check_unique_tags.rst

This file was deleted.

24 changes: 24 additions & 0 deletions docs/source/faqs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,27 @@ Note that both semi-supervised and context models will increase memory usage
If you encounter this error, reduce batch sizes during training or inference.
You can find the relevant parameters to adjust in :ref:`The configuration file <config_file>`
section.

.. _faq_nan_heatmaps:

**Q: Why does the network produce high confidence values for keypoints even when they are occluded?**

Generally, when a keypoint is briefly occluded and its location can be resolved by the network, we are fine with
high confidence values (this will happen, for example, when using temporal context frames).
However, there may be scenarios where the goal is to explicitly track whether a keypoint is visible or hidden using
confidence values (e.g., quantifying whether a tongue is in or out of the mouth).
In this case, if the confidence values are too high during occlusions, try the suggestions below.

First, note that including a keypoint in the unsupervised losses - especially the PCA losses -
will generally increase confidence values even during occlusions (by design).
If a low confidence value is desired during occlusions, ensure the keypoint in question is not
included in those losses.

If this does not fix the issue, another option is to set the following field in the config file:
``training.uniform_heatmaps_for_nan_keypoints: true``.
[This field is not visible in the default config but can be added.]
This option will force the model to output a uniform heatmap for any keypoint that does not have
a ground truth label in the training data.
The model will therefore not try to guess where the occluded keypoint is located.
This approach requires a set of training frames that include both visible and occluded examples
of the keypoint in question.
19 changes: 19 additions & 0 deletions docs/source/user_guide/config_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,25 @@ The config file contains several sections:
* ``losses``: hyperparameters for unsupervised losses
* ``eval``: paths for video inference and fiftyone app

Data parameters
===============

* ``data.imaged_orig_dims.height/width``: the current version of Lightning Pose requires all training images to be the same size. We are working on an updated version without this requirement. However, if you plan to use the PCA losses (Pose PCA or multiview PCA) then all training images **must** be the same size, otherwise the PCA subspace will erroneously contain variance related to image size.

* ``data.image_resize_dims.height/width``: images (and videos) will be resized to the specified height and width before being processed by the network. Supported values are {64, 128, 256, 384, 512}. The height and width need not be identical. Some points to keep in mind when selecting
these values: if the resized images are too small, you will lose resolution/details; if they are too large, the model takes longer to train and might not train as well.

* ``data.data_dir/video_dir``: update these to reflect your local paths

* ``data.num_keypoints``: the number of body parts. If using a mirrored setup, this should be the number of body parts summed across all views. If using a multiview setup, this number should indicate the number of keyponts per view (must be the same across all views).

* ``data.keypoint_names``: keypoint names should reflect the actual names/order in the csv file. This field is necessary if, for example, you are running inference on a machine that does not have the training data saved on it.

* ``data.columns_for_singleview_pca``: see the :ref:`Pose PCA documentation <unsup_loss_pcasv>`

* ``data.mirrored_column_matches``: see the :ref:`Multiview PCA documentation <unsup_loss_pcamv>`


Model/training parameters
=========================

Expand Down
42 changes: 22 additions & 20 deletions docs/source/user_guide_advanced/unsupervised_losses.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ and brief descriptions of some of the available losses.
#. :ref:`Data requirements <unsup_data>`
#. :ref:`The configuration file <unsup_config>`
#. :ref:`Loss options <unsup_loss_options>`
* :ref:`Temporal continuity <unsup_loss_temporal>`
* :ref:`Pose plausibility <unsup_loss_pcasv>`
* :ref:`Multiview consistency <unsup_loss_pcamv>`
* :ref:`Temporal difference <unsup_loss_temporal>`
* :ref:`Pose PCA <unsup_loss_pcasv>`
* :ref:`Multiview PCA <unsup_loss_pcamv>`

.. _unsup_data:

Expand Down Expand Up @@ -122,9 +122,18 @@ losses across multiple datasets, but we encourage users to test out several valu
data for best effect. The inverse of this weight is actually used for the final weight, so smaller
values indicate stronger penalties.

We are particularly interested in preventing, and having the network learn from, severe violations
of the different losses.
Therefore, we enforce our losses only when they exceed a tolerance threshold :math:`\epsilon`,
rendering them :math:`\epsilon`-insensitive:

.. math::
\mathscr{L}(\epsilon) = \textrm{max}(0, \mathscr{L} - \epsilon).
.. _unsup_loss_temporal:

Temporal continuity
Temporal difference
-------------------
This loss penalizes the difference in predictions between successive timepoints for each keypoint
independently.
Expand All @@ -133,16 +142,17 @@ independently.
temporal:
log_weight: 5.0
epsilon: 20.0
prob_threshold: 0.05
epsilon: 20.0
* ``log_weight``: weight of the loss in the final cost function
* ``epsilon``: in pixels; temporal differences below this threshold are not penalized, which keeps natural movements from being penalized. The value of epsilon will depend on the size of the video frames, framerate (how much does the animal move from one frame to the next), the size of the animal in the frame, etc.
* ``prob_threshold``: predictions with a probability below this threshold are not included in the loss. This is desirable if, for example, a keypoint is occluded and the prediction has low probability.
* ``epsilon``: in pixels; temporal differences below this threshold are not penalized, which keeps natural movements from being penalized. The value of epsilon will depend on the size of the video frames, framerate (how much does the animal move from one frame to the next), the size of the animal in the frame, etc.

.. _unsup_loss_pcasv:

Pose plausibility
Pose PCA
-----------------
This loss penalizes deviations away from a low-dimensional subspace of plausible poses computed on
labeled data.
Expand Down Expand Up @@ -186,7 +196,7 @@ If instead you want to include the ears and tailbase:
columns_for_singleview_pca: [1, 2, 4]
See
`these config files <https://github.com/danbider/lightning-pose/tree/feature/docs/scripts/configs>`_
`these config files <https://github.com/danbider/lightning-pose/tree/main/scripts/configs>`_
for more examples.

Below are the various hyperparameters and their descriptions.
Expand All @@ -197,19 +207,15 @@ Besides the ``log_weight`` none of the provided values need to be tested for new
pca_singleview:
log_weight: 5.0
components_to_keep: 0.99
empirical_epsilon_percentile: 1.00
empirical_epsilon_multiplier: 1.0
epsilon: null
* ``log_weight``: weight of the loss in the final cost function
* ``components_to_keep``: predictions should lie within the low-d subspace spanned by components that describe this fraction of variance
* ``empirical_epsilon_percentile``: the reprojecton error on labeled training data is computed to arrive at a noise ceiling; reprojection errors from the video data are not penalized if they fall below this percentile of labeled data error (replaces ``epsilon``)
* ``empirical_epsilon_multiplier``: this allows the user to increase the epsilon relative the the empirical epsilon error; with the multiplier the effective epsilon is `eff_epsilon = percentile(error, empirical_epsilon_percentile) * empirical_epsilon_multiplier`
* ``epsilon``: absolute error (in pixels) below which pca loss is zeroed out; if not null, this parameter takes precedence over ``empirical_epsilon_percentile``
* ``epsilon``: if not null, this parameter is automatically computed from the labeled data

.. _unsup_loss_pcamv:

Multiview consistency
Multiview PCA
---------------------
This loss penalizes deviations of predictions across all available views away from a 3-dimensional
subspace computed on labeled data.
Expand Down Expand Up @@ -273,12 +279,8 @@ Besides the ``log_weight`` none of the provided values need to be tested for new
pca_multiview:
log_weight: 5.0
components_to_keep: 3
empirical_epsilon_percentile: 1.00
empirical_epsilon_multiplier: 1.0
epsilon: null
* ``log_weight``: weight of the loss in the final cost function
* ``components_to_keep``: predictions should lie within the 3D subspace
* ``empirical_epsilon_percentile``: the reprojecton error on labeled training data is computed to arrive at a noise ceiling; reprojection errors from the video data are not penalized if they fall below this percentile of labeled data error (replaces ``epsilon``)
* ``empirical_epsilon_multiplier``: this allows the user to increase the epsilon relative the the empirical epsilon error; with the multiplier the effective epsilon is `eff_epsilon = percentile(error, empirical_epsilon_percentile) * empirical_epsilon_multiplier`
* ``epsilon``: absolute error (in pixels) below which pca loss is zeroed out; if not null, this parameter takes precedence over ``empirical_epsilon_percentile``
* ``components_to_keep``: should be set to 3 so that predictions lie within a 3D subspace
* ``epsilon``: if not null, this parameter is automatically computed from the labeled data
2 changes: 1 addition & 1 deletion lightning_pose/data/dali.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ def video_pipe(
else:
# choose arbitrary scalar (rather than a matrix) so that downstream operations know there
# is no geometric transforms to undo
matrix = -1
matrix = np.array([-1])
# video pixel range is [0, 255]; transform it to [0, 1].
# happens naturally in the torchvision transform to tensor.
video = video / 255.0
Expand Down
Loading

0 comments on commit 38276d1

Please sign in to comment.