docs + config cleanup (closes #128)

* fiftyone: get rid of "fast" dataset creation option (always include confidence) * fiftyone: remove unused video plotting functionality * pca: remove some options from configs to clean up * fiftyone: update colab notebook * [docs] data config and nan uniform heatmaps * flake + isort + tests * PR updates * make dali matrix an array rather than a scalar
paninski-lab · Feb 18, 2024 · 38276d1 · 38276d1
1 parent c9fde49
commit 38276d1
Show file tree

Hide file tree

Showing 22 changed files with 192 additions and 412 deletions.
diff --git a/docs/api/lightning_pose.utils.fiftyone.FiftyOneFactory.rst b/docs/api/lightning_pose.utils.fiftyone.FiftyOneFactory.rst
diff --git a/docs/api/lightning_pose.utils.fiftyone.FiftyOneImagePlotter.rst b/docs/api/lightning_pose.utils.fiftyone.FiftyOneImagePlotter.rst
@@ -11,19 +11,39 @@ FiftyOneImagePlotter
    .. autosummary::
 
       ~FiftyOneImagePlotter.image_paths
+      ~FiftyOneImagePlotter.img_height
+      ~FiftyOneImagePlotter.img_width
+      ~FiftyOneImagePlotter.model_names
+      ~FiftyOneImagePlotter.num_keypoints
 
    .. rubric:: Methods Summary
 
    .. autosummary::
 
+      ~FiftyOneImagePlotter.build_single_frame_keypoints
       ~FiftyOneImagePlotter.create_dataset
+      ~FiftyOneImagePlotter.dataset_info_print
       ~FiftyOneImagePlotter.get_gt_keypoints_list
+      ~FiftyOneImagePlotter.get_keypoints_per_image
+      ~FiftyOneImagePlotter.get_model_abs_paths
+      ~FiftyOneImagePlotter.get_pred_keypoints_dict
+      ~FiftyOneImagePlotter.load_model_predictions
 
    .. rubric:: Attributes Documentation
 
    .. autoattribute:: image_paths
+   .. autoattribute:: img_height
+   .. autoattribute:: img_width
+   .. autoattribute:: model_names
+   .. autoattribute:: num_keypoints
 
    .. rubric:: Methods Documentation
 
+   .. automethod:: build_single_frame_keypoints
    .. automethod:: create_dataset
+   .. automethod:: dataset_info_print
    .. automethod:: get_gt_keypoints_list
+   .. automethod:: get_keypoints_per_image
+   .. automethod:: get_model_abs_paths
+   .. automethod:: get_pred_keypoints_dict
+   .. automethod:: load_model_predictions
diff --git a/docs/api/lightning_pose.utils.fiftyone.FiftyOneKeypointBase.rst b/docs/api/lightning_pose.utils.fiftyone.FiftyOneKeypointBase.rst
diff --git a/docs/api/lightning_pose.utils.fiftyone.FiftyOneKeypointVideoPlotter.rst b/docs/api/lightning_pose.utils.fiftyone.FiftyOneKeypointVideoPlotter.rst
diff --git a/docs/api/lightning_pose.utils.fiftyone.check_unique_tags.rst b/docs/api/lightning_pose.utils.fiftyone.check_unique_tags.rst
diff --git a/docs/source/faqs.rst b/docs/source/faqs.rst
@@ -24,3 +24,27 @@ Note that both semi-supervised and context models will increase memory usage
 If you encounter this error, reduce batch sizes during training or inference.
 You can find the relevant parameters to adjust in :ref:`The configuration file <config_file>`
 section.
+
+.. _faq_nan_heatmaps:
+
+**Q: Why does the network produce high confidence values for keypoints even when they are occluded?**
+
+Generally, when a keypoint is briefly occluded and its location can be resolved by the network, we are fine with
+high confidence values (this will happen, for example, when using temporal context frames).
+However, there may be scenarios where the goal is to explicitly track whether a keypoint is visible or hidden using
+confidence values (e.g., quantifying whether a tongue is in or out of the mouth).
+In this case, if the confidence values are too high during occlusions, try the suggestions below.
+
+First, note that including a keypoint in the unsupervised losses - especially the PCA losses -
+will generally increase confidence values even during occlusions (by design).
+If a low confidence value is desired during occlusions, ensure the keypoint in question is not
+included in those losses.
+
+If this does not fix the issue, another option is to set the following field in the config file:
+``training.uniform_heatmaps_for_nan_keypoints: true``.
+[This field is not visible in the default config but can be added.]
+This option will force the model to output a uniform heatmap for any keypoint that does not have
+a ground truth label in the training data.
+The model will therefore not try to guess where the occluded keypoint is located.
+This approach requires a set of training frames that include both visible and occluded examples
+of the keypoint in question.
diff --git a/docs/source/user_guide/config_file.rst b/docs/source/user_guide/config_file.rst
@@ -21,6 +21,25 @@ The config file contains several sections:
 * ``losses``: hyperparameters for unsupervised losses
 * ``eval``: paths for video inference and fiftyone app
 
+Data parameters
+===============
+
+* ``data.imaged_orig_dims.height/width``: the current version of Lightning Pose requires all training images to be the same size. We are working on an updated version without this requirement. However, if you plan to use the PCA losses (Pose PCA or multiview PCA) then all training images **must** be the same size, otherwise the PCA subspace will erroneously contain variance related to image size.
+
+* ``data.image_resize_dims.height/width``: images (and videos) will be resized to the specified height and width before being processed by the network. Supported values are {64, 128, 256, 384, 512}. The height and width need not be identical. Some points to keep in mind when selecting
+these values: if the resized images are too small, you will lose resolution/details; if they are too large, the model takes longer to train and might not train as well.
+
+* ``data.data_dir/video_dir``: update these to reflect your local paths
+
+* ``data.num_keypoints``: the number of body parts. If using a mirrored setup, this should be the number of body parts summed across all views. If using a multiview setup, this number should indicate the number of keyponts per view (must be the same across all views).
+
+* ``data.keypoint_names``: keypoint names should reflect the actual names/order in the csv file. This field is necessary if, for example, you are running inference on a machine that does not have the training data saved on it.
+
+* ``data.columns_for_singleview_pca``: see the :ref:`Pose PCA documentation <unsup_loss_pcasv>`
+
+* ``data.mirrored_column_matches``: see the :ref:`Multiview PCA documentation <unsup_loss_pcamv>`
+
+
 Model/training parameters
 =========================
 

diff --git a/docs/source/user_guide_advanced/unsupervised_losses.rst b/docs/source/user_guide_advanced/unsupervised_losses.rst
@@ -12,9 +12,9 @@ and brief descriptions of some of the available losses.
 #. :ref:`Data requirements <unsup_data>`
 #. :ref:`The configuration file <unsup_config>`
 #. :ref:`Loss options <unsup_loss_options>`
-    * :ref:`Temporal continuity <unsup_loss_temporal>`
-    * :ref:`Pose plausibility <unsup_loss_pcasv>`
-    * :ref:`Multiview consistency <unsup_loss_pcamv>`
+    * :ref:`Temporal difference <unsup_loss_temporal>`
+    * :ref:`Pose PCA <unsup_loss_pcasv>`
+    * :ref:`Multiview PCA <unsup_loss_pcamv>`
 
 .. _unsup_data:
 
@@ -122,9 +122,18 @@ losses across multiple datasets, but we encourage users to test out several valu
 data for best effect. The inverse of this weight is actually used for the final weight, so smaller
 values indicate stronger penalties.
 
+We are particularly interested in preventing, and having the network learn from, severe violations
+of the different losses.
+Therefore, we enforce our losses only when they exceed a tolerance threshold :math:`\epsilon`,
+rendering them :math:`\epsilon`-insensitive:
+
+.. math::
+
+    \mathscr{L}(\epsilon) = \textrm{max}(0, \mathscr{L} - \epsilon).
+
 .. _unsup_loss_temporal:
 
-Temporal continuity
+Temporal difference
 -------------------
 This loss penalizes the difference in predictions between successive timepoints for each keypoint
 independently.
@@ -133,16 +142,17 @@ independently.
 
       temporal:
         log_weight: 5.0
-        epsilon: 20.0
         prob_threshold: 0.05
+        epsilon: 20.0
+
 
 * ``log_weight``: weight of the loss in the final cost function
-* ``epsilon``: in pixels; temporal differences below this threshold are not penalized, which keeps natural movements from being penalized. The value of epsilon will depend on the size of the video frames, framerate (how much does the animal move from one frame to the next), the size of the animal in the frame, etc.
 * ``prob_threshold``: predictions with a probability below this threshold are not included in the loss. This is desirable if, for example, a keypoint is occluded and the prediction has low probability.
+* ``epsilon``: in pixels; temporal differences below this threshold are not penalized, which keeps natural movements from being penalized. The value of epsilon will depend on the size of the video frames, framerate (how much does the animal move from one frame to the next), the size of the animal in the frame, etc.
 
 .. _unsup_loss_pcasv:
 
-Pose plausibility
+Pose PCA
 -----------------
 This loss penalizes deviations away from a low-dimensional subspace of plausible poses computed on
 labeled data.
@@ -186,7 +196,7 @@ If instead you want to include the ears and tailbase:
       columns_for_singleview_pca: [1, 2, 4]
 
 See
-`these config files <https://github.com/danbider/lightning-pose/tree/feature/docs/scripts/configs>`_
+`these config files <https://github.com/danbider/lightning-pose/tree/main/scripts/configs>`_
 for more examples.
 
 Below are the various hyperparameters and their descriptions.
@@ -197,19 +207,15 @@ Besides the ``log_weight`` none of the provided values need to be tested for new
       pca_singleview:
         log_weight: 5.0
         components_to_keep: 0.99
-        empirical_epsilon_percentile: 1.00
-        empirical_epsilon_multiplier: 1.0
         epsilon: null
 
 * ``log_weight``: weight of the loss in the final cost function
 * ``components_to_keep``: predictions should lie within the low-d subspace spanned by components that describe this fraction of variance
-* ``empirical_epsilon_percentile``: the reprojecton error on labeled training data is computed to arrive at a noise ceiling; reprojection errors from the video data are not penalized if they fall below this percentile of labeled data error (replaces ``epsilon``)
-* ``empirical_epsilon_multiplier``: this allows the user to increase the epsilon relative the the empirical epsilon error; with the multiplier the effective epsilon is `eff_epsilon = percentile(error, empirical_epsilon_percentile) * empirical_epsilon_multiplier`
-* ``epsilon``: absolute error (in pixels) below which pca loss is zeroed out; if not null, this parameter takes precedence over ``empirical_epsilon_percentile``
+* ``epsilon``: if not null, this parameter is automatically computed from the labeled data
 
 .. _unsup_loss_pcamv:
 
-Multiview consistency
+Multiview PCA
 ---------------------
 This loss penalizes deviations of predictions across all available views away from a 3-dimensional
 subspace computed on labeled data.
@@ -273,12 +279,8 @@ Besides the ``log_weight`` none of the provided values need to be tested for new
       pca_multiview:
         log_weight: 5.0
         components_to_keep: 3
-        empirical_epsilon_percentile: 1.00
-        empirical_epsilon_multiplier: 1.0
         epsilon: null
 
 * ``log_weight``: weight of the loss in the final cost function
-* ``components_to_keep``: predictions should lie within the 3D subspace
-* ``empirical_epsilon_percentile``: the reprojecton error on labeled training data is computed to arrive at a noise ceiling; reprojection errors from the video data are not penalized if they fall below this percentile of labeled data error (replaces ``epsilon``)
-* ``empirical_epsilon_multiplier``: this allows the user to increase the epsilon relative the the empirical epsilon error; with the multiplier the effective epsilon is `eff_epsilon = percentile(error, empirical_epsilon_percentile) * empirical_epsilon_multiplier`
-* ``epsilon``: absolute error (in pixels) below which pca loss is zeroed out; if not null, this parameter takes precedence over ``empirical_epsilon_percentile``
+* ``components_to_keep``: should be set to 3 so that predictions lie within a 3D subspace
+* ``epsilon``: if not null, this parameter is automatically computed from the labeled data
diff --git a/lightning_pose/data/dali.py b/lightning_pose/data/dali.py
@@ -112,7 +112,7 @@ def video_pipe(
     else:
         # choose arbitrary scalar (rather than a matrix) so that downstream operations know there
         # is no geometric transforms to undo
-        matrix = -1
+        matrix = np.array([-1])
     # video pixel range is [0, 255]; transform it to [0, 1].
     # happens naturally in the torchvision transform to tensor.
     video = video / 255.0