Skip to content
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.

create a docs folder with detailed info on each pipeline stage #34

Open
bryantChhun opened this issue Jul 8, 2021 · 1 comment
Open

Comments

@bryantChhun
Copy link
Contributor

Before 1.0-beta release:

A useful resource would be a centralized folder that contains multiple documents, one for each stage of the pipeline.

  • preprocess
  • segmentation
  • patch
  • VAE training and encoding
  • dimensionality reduction
@bryantChhun
Copy link
Contributor Author

bryantChhun commented Jul 9, 2021

For example, the dim_reduction module has the following usage:
config file
The configuration file now accepts a list of input, output full paths to directories. The file_name_prefixes is a list of string prefixes. The weights_dir is a single directory in which pca_model.pkl is written as a result of PCA fitting (fit_model = True). The conditions is a list of strings describing experimental conditions. This value is only used during plotting after fitting.

details
For fit_model: True:

  1. loops over all directories listed in config's input_dirs
  2. loops over all prefixes in config's file_name_prefixes
  3. [aggregate all data]: searches for <prefix>_latent_space_after.pkl files in the input dirs and concatenates them in a vector list for subsequent PCA fitting
  4. Fitting will write a model pca_model.pkl to the config's weights_dir directory.
  5. Fitting will write a figure PCA.png to the config's weights_dir directory
  6. finally, will loop over all pairs of input_dirs and output_dirs in the config:
  7. will run inference on all individual <prefix>_latent_space_<suffix>.pkl in input_dir folder, where suffix='after' hardcoded. And where the supplied model is the one generated from step 4 above.
  8. output of each inference is <prefix>_latent_space_after_PCAed.pkl and saved to each corresponding output_dir from 6

For fit_model: False:

  1. loops over all pairs of directories listed in config's input_dirs / output_dirs
  2. loops over all prefixes in config's file_name_prefixes
  3. assumes the weights_dir supplied in the config is a directory, and looks for the pca_model.pkl file there.
  4. runs inference on <prefix>_latent_space_<suffix>.pkl where suffix=after is hardcoded.
  5. writes the transformed vectors to <prefix>_latent_space_<suffix>_PCAed.pkl in the corresponding output_dir directory

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant