Skip to content

Wrapper Script for Harmony designed to work with GenePattern Module Integrator

License

Notifications You must be signed in to change notification settings

genepattern/Harmony

Repository files navigation

About Harmony

What is Harmony?

Harmony is a tool used to correct batch effects in single-cell RNA seq datasets. Batch effects are differences in your dataset that don't reflect biological phenomena that are caused by technical effects that don't reflect underlying biology. As seen in the plot below, this can lead to the data becoming separated by non-biological factors. Harmony aims to remove these batch effects, allowing for the integration of multiple different datasets. Alt text

How to use Harmony

To use the Harmony module, you will need to have put your scRNA-seq data through the Seurat pipeline available on GenePattern, as seen below: Alt text The user must run Seurat on each condition that they wish to batch correct for. The RDS files from these modules must be used as inputs for Harmony. Once Harmony has been completed, the module will output five files:

Harmonized Data - An RDS file containing a Seurat object with the Harmony-processed data. The Harmony-adjusted principal components can be found in the "harmony" column under the "reduction" slot in the Seurat object. The name of this file is specified by the "Output Name" parameter.

Before Harmony Plot - A PNG file showing a scatterplot of the dimensionality-reduced data before Harmony. The method of dimensionality-reduction shown can be specified by the "reduction" parameter.

After Harmony Plot - A PNG file showing a scatterplot of the data post-Harmony.

Side To Side Plot - A PNG file showing the Before Harmony Plot and After Harmony Plot side by side for debugging purposes.

Animation - A GIF animation showing a smoother transition between each iteration of Harmony's batch-correction, as shown above.

Basic Parameters

Here are the basic parameters you will need in order to run the Harmony module.

  1. Input RDS Files (required)

    • The list of datasets you wish to analyze with Harmony. Each dataset must consist of one group of data, and must be a Seurat object in an .rds file.
  2. Output Name (required)

    • The prefix that you would like to use to name your output files. One file will contain your Harmony-processed data, and another file will display the scatterplot made for the data.
  3. Data Set Names (optional)

    • The names of the datasets you would like to apply with Harmony. The list of names should be as long as the list of datasets. By default, the names of the datasets will be designated as the names of the files.
  4. Group Name (optional)

    • The name of the metadata column you would like to group by during visualization. If no group name is specified, then Harmony will group by dataset by default.
  5. Colors (optional)

Advanced Parameters

These parameters are for more advanced use. They are all optional.

  1. reduction

    • Name of dimension reduction to use. Default: pca
  2. dims use

    • Which PCA dimensions to use for Harmony. By default, use all
  3. theta

    • Diversity clustering penalty parameter. Specify for each variable in group.by.vars. theta=0 does not encourage any diversity. Larger values of theta result in more diverse clusters. Default: 2
  4. lambda

    • Ridge regression penalty parameter. Specify for each variable in group.by.vars. Lambda must be strictly positive. Smaller values result in more aggressive correction. Default: 1
  5. sigma

    • Width of soft kmeans clusters. Sigma scales the distance from a cell to cluster centroids. Larger values of sigma result in cells assigned to more clusters. Smaller values of sigma make soft kmeans cluster approach hard clustering. Default: 0.1
  6. nclust

    • Number of clusters in model. nclust=1 equivalent to simple linear regression.
  7. tau

    • Protection against overclustering small datasets with large ones. tau is the expected number of cells per cluster. Default: 0
  8. block size

    • What proportion of cells to update during clustering. Between 0 to 1. Larger values may be faster but less accurate. Default: 0.05
  9. max iter harmony

    • Maximum number of iterations that Harmony will run. Default: 10
  10. max iter cluster

    • Maximum number of rounds to run clustering at each round of Harmony. Default: 20
  11. stop early cluster

    • Whether or not to stop clustering early. If TRUE, then the convergence tolerance is specified by the epsilon cluster parameter. Default: TRUE
  12. epsilon cluster

    • Convergence tolerance for clustering round of Harmony. Default: 0.00005
  13. stop early harmony

    • Whether or not to stop harmony early. If TRUE, then the convergence tolerance is specified by the epsilon harmony parameter. Default: TRUE
  14. epsilon harmony

    • Convergence tolerance for Harmony. Default: 0.0004
  15. plot_convergence

    • Whether to print the convergence plot of the clustering objective function. TRUE to plot, FALSE to suppress. This can be useful for debugging. Default: FALSE
  16. verbose

    • Whether to print progress messages. TRUE to print, FALSE to suppress. Default: TRUE
  17. reference_values

    • Defines reference dataset(s). Cells that have batch variables values matching reference_values will not be moved
  18. reduction save

    • Keyword to save Harmony reduction. Useful if you want to try Harmony with multiple parameters and save them as e.g. 'harmony_theta0', 'harmony_theta1', 'harmony_theta2'. Default: harmony
  19. assay use

    • Which assay to run PCA on if no PCA present?
  20. project dim

    • Project dimension reduction loadings. Default: TRUE

Documentation

Citation

Ilya Korsunsky, Nghia Millard, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-Ru Loh, Soumya Raychaudhuri, Fast, sensitive and accurate integration of single-cell data with Harmony, Nature Methods, 18 November 2019, https://doi.org/10.1038/s41592-019-0619-0

Contact

Justin Lee: jzl010@ucsd.edu

About

Wrapper Script for Harmony designed to work with GenePattern Module Integrator

Resources

License

Stars

Watchers

Forks

Packages

No packages published