-
This project now uses viash version 0.8.0 to build components and workflows. Moving to 0.8.0 involved the following changes:
- Bump viash version to 0.8.0 (PR #598) in the project configuration.
- The
concat
component had been deprecated and will be removed in a future release. It's functionality has been copied to theconcatenate_h5mu
component because the name is in conflict with theconcat
operator from nextflow (PR #598). - All pipelines no longer use the anonymous workflow. Instead, these workflows were given a name which was added to the viash config as the entrypoint to the pipeline (PR #598).
- Removed the
workflows
folder and moved its contents to new locations (PR #605):- The
resources_test_scripts
folder now resides in the root of the project. - All workflows have been moved to the
src/workflows
folder. - Adjust GitHub Actions to account for new workflow paths.
- The
-
Renamed
obsm_metrics
touns_metrics
for thecellranger_mapping
workflow because the cellranger metrics are stored in.uns
and not.obsm
(PR #610).
rna_multisample
workflow: added--modality
argument (PR #607).
-
Refactored
rna_multisample
pipeline to usefromState
andtoState
functionality (PR #607). -
Refactored
cellranger_multi
workflow to usefromState
andtoState
functionality (PR #609). -
Refactored
cellranger_mapping
workflow to usefromState
andtoState
functionality (PR #610).
-
rna_singlesample
: Fix filtering parameters valuesmin_counts
,max_counts
,min_genes_per_cell
,max_genes_per_cell
andmin_cells_per_gene
not being passed to thefilter_with_counts
component (PR #614). -
prot_singlesample
: Fix filtering parameters valuesmin_counts
,max_counts
,min_proteins_per_cell
,max_proteins_per_cell
andmin_cells_per_protein
not being passed to thefilter_with_counts
component (PR #614).
The detection of mitochondrial genes has been revisited in order to remove the interdependency with the count filtering and the QC metric calculation. Implementing this changes involved breaking some existing functionality:
-
filter/filter_with_counts
: removed--var_gene_names
,--mitochondrial_gene_regex
,--var_name_mitochondrial_genes
,--min_fraction_mito
and--max_fraction_mito
(PR #585). -
workflows/prot_singlesample
: removed--min_fraction_mito
and--max_fraction_mito
because regex-based detection detection of mitochondrial genes is not possible (PR #585). -
The fraction of counts that originated from mitochondrial genes used to be written to an .obs column with a name that was derived from
pct_
suffixed by the name of the mitochondrial gene column. The--obs_name_mitochondrial_fraction
argument is introduced to change the destination column and the default prefix has changed frompct_
tofraction_
(PR #585).
-
workflows/qc
: A pipeline to add basic qc statistics to a MuData object (PR #585). -
workflows/rna_singlesample
: added--obs_name_mitochondrial_fraction
and make sure that the values from--max_fraction_mito
and--min_fraction_mito
are bound between 0 and 1 (PR #585). -
Added
filter/delimit_fraction
: Turns an annotation column containing values between 0 and 1 into a boolean column based on thresholds (PR #585). -
Added
metadata/grep_annotation_column
: Perform a regex lookup on a column from the annotation matrices .obs or .var (PR #585). -
workflows/full_pipelines
: added--obs_name_mitochondrial_fraction
argument (PR #585). -
workflows/prot_multisample
: added--var_qc_metrics
and--top_n_vars
arguments (PR #585). -
Added genetic demultiplexing methods
cellsnp
,demuxlet
,freebayes
,freemuxlet
,scsplit
,sourorcell
andvireo
(PR #343).
-
Several components: bump scanpy to 1.9.5 (PR #594).
-
Refactored
prot_multisample
andprot_singlesample
pipelines to usefromState
andtoState
functionality (PR #585).
-
Nextflow VDSL3: set
simplifyOutput
toFalse
by default. This implies that components and workflows will output a hashmap with a sole "output" entry when there is only one output (PR #563). -
integrate/scvi
: renamemodel_output
argument tooutput_model
in order to align with thescvi_leiden
workflow. This also fixes a bug with the workflow where the argument did not function (PR #562).
-
dataflow/concat
: reduce memory consumption when using--other_axis_mode move
by processing only one annotation matrix (.var
,.obs
) at a time (PR #569). -
Update viashpy and pin it to
0.5.0
(PR #572 and PR #577). -
convert/from_h5ad_to_h5mu
,convert/from_h5mu_to_h5ad
,dimred/pca
,dimred/umap/
,filter/filter_with_counts
,filter/filter_with_hvg
,filter/remove_modality
,filter/subset_h5mu
,integrate/scanorama
,transform/delete_layer
andtransform/log1p
: update python to3.9
(PR #572). -
integrate/scarches
: update base image,scvi-tools
andpandas
tonvcr.io/nvidia/pytorch:23.09-py3
,~=1.0.3
and~=2.1.0
respectively (PR #572). -
integrate/totalvi
: update python to 3.9 and scvi-tools to~=1.0.3
(PR #572). -
correction/cellbender_remove_background
: change base image tonvcr.io/nvidia/cuda:11.8.0-devel-ubuntu22.04
and downwgrade MuData to 0.2.1 because it is the oldest version that uses python 3.7 (PR #575). -
Several integration workflows: prevent leiden from being executed when no resolutions are provided (PR #583).
-
dataflow/concat
: bump pandas to ~=2.1.1 and reduce memory consumption by only reading one modality into memory at a time (PR #568). -
annotate/popv
: bumpjax
andjaxlib
to0.4.10
, scanpy to1.9.4
, scvi to1.0.3
and pinml-dtypes
to < 0.3.0 (PR #565). -
velocity/scvelo
: pin matplotlib to < 3.8.0 (PR #566). -
mapping/multi_star
: pin multiqc to 1.15.0 (PR #566). -
mapping/bd_rhapsody
: pin pandas version to <2 (PR #563). -
query/cellxgene_census
: replaced labelsinglecpu
with labelmidcpu
. -
query/cellxgene_census
: avoid creating MuData object in memory by writing the modality directly to disk (PR #558). -
integrate/scvi
: usemidcpu
label instead ofsinglecpu
(PR #561).
-
transform/clr
: raise an error when CLR fails to return the requested output (PR #579). -
correction/cellbender_remove_background
: fix missing helper functionality when using Fusion (PR #575). -
convert/from_bdrhap_to_h5mu
: AvoidTypeError: Can't implicitly convert non-string objects to strings
by using categorical dtypes when a string column contains NA values (PR #563). -
qc/calculate_qc_metrics
: fix calculating mitochondrial gene related QC metrics when only or no mitochondrial genes were found (PR #564).
- Added
protein_processing/dsb_index
andprotein_processing/dsb_normalize
components (PR #588).
integration/scvi_leiden
: Expose hvg selection argument--var_input
(#543, PR #547).
-
integration/bbknn_leiden
: Set leiden clustering parameter to multiple (#542, PR #545). -
integration/scvi_leiden
: Fix component name in Viash config (PR #547). -
integration/harmony_leiden
: Pass--uns_neighbors
argumentumap
(PR #548). -
Add workaround for bug where resources aren't available when using Nextflow fusion by including
setup_logger
,subset_vars
andcompress_h5mu
in the script itself (PR #549).
-
workflows/full_pipeline
: removed--prot_min_fraction_mito
and--prot_max_fraction_mito
(PR #451) -
workflows/rna_multisample
andworkflows/prot_multisample
: Removed concatenation from these pipelines. The input for these pipelines is now a single mudata file that contains data for multiple samples. If you wish to use this pipeline on multiple single-sample mudata files, you can use thedataflow/concat
components on them first. This also implies that the ability to add ids to multiple single-sample mudata files prior to concatenation is no longer required, hence the removal of--add_id_to_obs
,--sample_id
,--add_id_obs_output
, and--add_id_make_observation_keys_unique
(PR #475). -
The
scvi
pipeline was renamed toscvi_leiden
becauseleiden
clustering was added to the pipeline (PR #499). -
Upgrade
correction/cellbender_remove_background
from CellBender v0.2 to CellBender v0.3.0 (PR #523). Between these versions, several arguments related to the slots of the output file have been changed.
-
Several components: update anndata to 0.9.3 and mudata to 0.2.3 (PR #423).
-
Base resources assigned for a process without any labels is now 1 CPU and 2GB (PR #518).
-
Updated to Viash 0.7.5 (PR #513).
-
Removed deprecated
variant: vdsl3
tags (PR #513). -
Removed unused
version: dev
(PR #513). -
multiomics/integration/harmony_leiden
: Refactored data flow (PR #513). -
ingestion/bd_rhapsody
: Refactored data flow (PR #513). -
query/cellxgene_census
: increased returned metadata content, revised query option, added filtering strategy and refactored functionality (PR #520). -
Refactor loggers using
setup_logger()
helper function (PR #534). -
Refactor unittest tests to pytest tests (PR #534).
-
Add resource labels to several components (PR #518).
-
full_pipeline
: default value for--var_qc_metrics
is now the combined values specified for--mitochondrial_gene_regex
and--filter_with_hvg_var_output
. -
dataflow/concat
: reduce memory consumption by only reading one modality at the same time (PR #474). -
Components that use CellRanger, BCL Convert or bcl2fastq: updated from Ubuntu 20.04 to Ubuntu 22.04 (PR #494).
-
Components that use CellRanger: updated Picard to 2.27.5 (PR #494).
-
interprete/liana
: Update lianapy to 0.1.9 (PR #497). -
qc/multiqc
: add unittests (PR #502). -
reference/build_cellranger_reference
: add unit tests (PR #506). -
reference/build_bd_rhapsody_reference
: add unittests (PR #504).
-
Added
compression/compress_h5mu
component (PR #530). -
Resource management: when a process exits with a status code between 137 and 140, retry the process with increased memory requirements. Memory scales by multiplying the base memory assigned to the process with the attempt number (PR #518 and PR #527).
-
integrate/scvi
: Add--n_hidden_nodes
,--n_dimensions_latent_space
,--n_hidden_layers
,--dropout_rate
,--dispersion
,--gene_likelihood
,--use_layer_normalization
,--use_batch_normalization
,--encode_covariates
,--deeply_inject_covariates
and--use_observed_lib_size
parameters. -
filter/filter_with_counts
: add--var_name_mitochondrial_genes
argument to store a boolean array corresponding the detected mitochondrial genes. -
full_pipeline
andrna_singlesample
pipelines: add--var_name_mitochondrial_genes
,--var_gene_names
and--mitochondrial_gene_regex
arguments to specify mitochondrial gene detection behaviour. -
integrate/scvi
: Add--obs_labels
,--obs_size_factor
,--obs_categorical_covariate
and--obs_continuous_covariate
arguments (PR #496). -
Added
var_qc_metrics_fill_na_value
argument tocalculate_qc_metrics
(PR #477). -
Added
multiomics/multisample
pipeline to run multisample processing followed by the integration setup. It is considered an entrypoint into the full pipeline which skips the single-sample processing. The idea is to allow a a re-run of these steps after a sample has already been processed by thefull_pipeline
. Keep in mind that samples that are provided as input to this pipeline are processed separately and are not concatenated. Hence, the input should be a concatenated sample (PR #475). -
Added
multiomics/integration/bbknn_leiden
workflow. (PR #456). -
workflows/prot_multisample
andworkflows/full_pipelines
: add basic QC statistics to prot modality (PR #485). -
mapping/cellranger_multi
: Add tests for the mapping of Crispr Guide Capture data (PR #494). -
convert/from_cellranger_multi_to_h5mu
: addperturbation_efficiencies_by_feature
andperturbation_efficiencies_by_feature
information to .uns slot ofgdo
modality (PR #494). -
convert/from_cellranger_multi_to_h5mu
: addfeature_reference
information to the MuData object. Information is split between the modalities. For exampleCRISPR Guide Capture
information if added to the.uns
slot of thegdo
modality, whileAntibody Capture
information is added to the .uns slot ofprot
(PR #494). -
Added
multiomics/integration/totalvi_leiden
pipeline (PR #500). -
Added totalVI component (PR #386).
-
workflows/full_pipeline
: Addpca_overwrite
argument (PR #511). -
Add
main_build_viash_hub
action to build, tag, and push components and docker images for viash-hub.com (PR #480). -
integration/bbknn_leiden
: Update state management tofromState
/toState
(PR #538).
-
images
: Added images for various concepts, such as a sample, a cell, RNA, ADT, ATAC, VDJ (PR #515). -
multiomics/rna_singlesample
: Add image for workflow (PR #515). -
multiomics/rna_multisample
: Add image for workflow (PR #515). -
multiomics/prot_singlesample
: Add image for workflow (PR #515). -
multiomics/prot_multisample
: Add image for workflow (PR #515).
-
Fix an issue with
workflows/multiomics/scanorama_leiden
where the--output
argument doesn't work as expected (PR #509). -
Fix an issue with
workflows/full_pipeline
not correctly caching previous runs (PR #460). -
Fix incorrect namespaces of the integration pipelines (PR #464).
-
Fix an issue in several workflows where the
--output
argument would not work (PR #476). -
integration/harmony_leiden
andintegration/scanorama_leiden
: Fix an issue where the prefix of the columns that store the leiden clusters was hardcoded toleiden
, instead of adapting to the value for--obs_cluster
(PR #482). -
velocity/velocyto
: Resolve symbolic link before checking whether the transcriptome is a gzip (PR #484). -
workflows/integration/scanorama_leiden
: fix an issue where--obsm_input
, --obs_batch,
--batch_size,
--sigma,
--approx,
--alphaand
-knn` were not working beacuse they were not passed through to the scanorama component (PR #487). -
workflows/integration/scanorama_leiden
: fix leiden being calculated on the wrong embedding because the--obsm_input
argument was not correctly set to the output embedding of scanorama (PR #487). -
mapping/cellranger_multi
: Fix and issue where modalities did not have the proper name (PR #494). -
metadata/add_uns_to_obs
: FixKeyError: 'ouput_compression'
error (PR #501). -
neighbors/bbknn
: Fix--input
not being a required argument (PR #518). -
Create
correction/cellbender_remove_background_v0.2
for legacy CellBender v0.2 format (PR #523). -
integrate/scvi
: Ensure output has the same dimensionality as the input (PR #524). -
mapping/bd_rhapsody
: Fix--dryrun
argument not working (PR #534). -
qc/multiqc
: Fix component not working for multiple inputs (PR #537). Also converted Bash script to Python scripts. -
neighbors/bbknn
: Fix--uns_output
,--obsp_distances
and--obsp_connectivities
not being processed correctly (PR #538).
Running the integration in the full_pipeline
deemed to be impractical because a plethora of integration methods exist, which in turn interact with other functionality (like clustering). This generates a large number of possible usecases which one pipeline cannot cover in an easy manner. Instead, each integration methods will be split into its separate pipeline, and the full_pipeline
will prepare for integration by performing steps that are required by many integration methods. Therefore, the following changes were performed:
-
workflows/full_pipeline
:harmony
integration andleiden
clustering are removed from the pipeline. -
Added
initialize_integration
to run calculations that output information commonly required by the integration methods. This pipeline runs PCA, nearest neighbours and UMAP. This pipeline is run as a subpipeline at the end offull_pipeline
. -
Added
leiden_harmony
integration pipeline: run harmony integration followed by neighbour calculations and leiden clustering. Also runs umap on the result. -
Removed the
integration
pipeline.
The old behavior of the full_pipeline
can be obtained by running full_pipeline
followed by the leiden_harmony
pipeline.
-
The
crispr
andhashing
modalities have been renamed togdo
andhto
respectively (PR #392). -
Updated Viash to 0.7.4 (PR #390).
-
cluster/leiden
: Output is now stored into.obsm
instead of.obs
(PR #431).
-
cluster/leiden
andintegration/harmony_leiden
: allow running leiden multiple times with multiple resolutions (PR #431). -
workflows/full_pipeline
: PCA, nearest neighbours and UMAP are now calculated for theprot
modality (PR #396). -
transform/clr
: addedoutput_layer
argument (PR #396). -
workflows/integration/scvi
: Run scvi integration followed by neighbour calculations and run umap on the result (PR #396). -
mapping/cellranger_multi
andworkflows/ingestion/cellranger_multi
: Added--vdj_inner_enrichment_primers
argument (PR #417). -
metadata/move_obsm_to_obs
: Move a matrix from an.obsm
slot into.obs
(PR #431). -
integrate/scvi
validity checks for non-normalized input, obs and vars in order to proceed to training (PR #429). -
schemas
: Added schema files for authors (PR #436). -
schemas
: Added schema file for Viash configs (PR #436). -
schemas
: Refactor author import paths (PR #436). -
schemas
: Added schema file for file format specification files (PR #437). -
query/cellxgene_census
: Query Cellxgene census component and save the results to a MuData file. (PR #433).
-
report/mermaid
: Now usedmermaid-cli
to generate images instead of creating a request tomermaid.ink
. New--output_format
,--width
,--height
and--background_color
arguments were added (PR #419). -
All components that used
python
as base container: useslim
version to reduce container image size (PR #427).
-
integrate/scvi
: update scvi to 1.0.0 (PR #448) -
mapping/multi_star
: Added--min_success_rate
which causes component to fail when the success rate of processed samples were successful (PR #408). -
correction/cellbender_remove_background
andtransform/clr
: update muon to 0.1.5 (PR #428) -
ingestion/cellranger_postprocessing
: split integration tests into several workflows (PR #425). -
schemas
: Add schema file for author yamls (PR #436). -
mapping/multi_star
,mapping/star_build_reference
andmapping/star_align
: update STAR from 2.7.10a to 2.7.10b (PR #441).
-
annotate/popv
: Fix concat issue when the input data has multiple layers (#395, PR #397). -
annotate/popv
: Fix indexing issue when MuData object contain non overlapping modalities (PR #405). -
mapping/multi_star
: Fix issue where temp dir could not be created when group_id contains slashes (PR #406). -
mapping/multi_star_to_h5mu
: Use glob to look for count files recursively (PR #408). -
annotate/popv
: PinPopV
,jax
andjaxlib
versions (PR #415). -
integrate/scvi
: the max_epochs is no longer required since it has a default value (PR #396). -
workflows/full_pipeline
: fixmake_observation_keys_unique
parameter not being correctly passed to theadd_id
component, causingValueError: Observations are not unique across samples
during execution of theconcat
component (PR #422). -
annotate/popv
: now setsaprox
toFalse
to avoid usingannoy
in scanorama because it fails on processors that are missing the AVX-512 instruction sets, causingIllegal instruction (core dumped)
. -
workflows/full_pipeline
: Avoid adding sample names to observation ids twice (PR #457).
-
workflows/full_pipeline
: Renamed inconsistencies in argument naming (#372):rna_min_vars_per_cell
was renamed torna_min_genes_per_cell
rna_max_vars_per_cell
was renamed torna_max_genes_per_cell
prot_min_vars_per_cell
was renamed toprot_min_proteins_per_cell
prot_max_vars_per_cell
was renamed toprot_max_proteins_per_cell
-
velocity/scvelo
: bump anndata from <0.8 to 0.9.
-
Added an extra label
veryhighmem
mostly forcellranger_multi
with a large number of samples. -
Added
multiomics/prot_multisample
pipeline. -
Added
clr
functionality toprot_multisample
pipeline. -
Added
interpret/lianapy
: Enables the use of any combination of ligand-receptor methods and resources, and their consensus. -
filter/filter_with_scrublet
: Add--allow_automatic_threshold_detection_fail
: when scrublet fails to detect doublets, the component will now putNA
in the output columns. -
workflows/full_pipeline
: Allow not setting the sample ID to the .obs column of the MuData object. -
workflows/rna_multisample
: Add the ID of the sample to the .obs column of the MuData object. -
correction/cellbender_remove_background
: addobsm_latent_gene_encoding
parameter to store the latent gene representation.
-
transform/clr
: fix anndata object instead of matrix being stored as a layer in outputMuData
, resulting inNoneTypeError
object after reading the.layers
back in. -
dataflow/concat
anddataflow/merge
: fixed a bug where boolean values were cast to their string representation. -
workflows/full_pipeline
: fix running pipeline with-stub
. -
Fixed an issue where passing a remote file URI (for example
http://
ors3://
) asparam_list
causedNo such file
errors. -
workflows/full_pipeline
: Fix incorrectly named filtering arguments (#372). -
integrate/scvi
: Fix bug when subsetting using thevar_input
argument (PR #385). -
correction/cellbender_remove_background
: addobsm_latent_gene_encoding
parameter to store the latent gene representation.
-
integrate/scarches
,integrate/scvi
andcorrection/cellbender_remove_background
: Update base container tonvcr.io/nvidia/pytorch:22.12-py3
-
integrate/scvi
: addgpu
label for nextflow platform. -
integrate/scvi
: use cuda enabledjax
install. -
convert/from_cellranger_multi_to_h5mu
,dataflow/concat
anddataflow/merge
: update pandas to 2.0.0 -
dataflow/concat
anddataflow/merge
: Boolean and integer columns are now represented by theBooleanArray
andIntegerArray
dtypes in order to allow storingNA
values. -
interpret/lianapy
: use the latest development release (commit 11156ddd0139a49dfebdd08ac230f0ebf008b7f8) of lianapy in order to fix compatibility with numpy 1.24.x. -
filter/filter_with_hvg
: Add error when specified input layer cannot be found in input data. -
workflows/multiomics/full_pipeline
: publish the output from sample merging to allow running different integrations. -
CI: Remove various unused software libraries from runner image in order to avoid
no space left on device
(PR #425, PR #447).
-
integrate/scvi
: usenvcr.io/nvidia/pytorch:22.09-py3
as base container to enable GPU acceleration. -
integrate/scvi
: add--model_output
to save model. -
workflows/ingestion/cellranger_mapping
: Addedoutput_type
to output the filtered Cell Ranger data as h5mu, not the converted raw 10xh5 output. -
Several components: added
--output_compression
component to set the compression of output .h5mu files. -
workflows/full_pipeline
andworkflows/integration
: Addedleiden_resolution
argument to control the coarseness of the clustering. -
Added
--rna_theta
and--rna_harmony_theta
to full and integration pipeline respectively in order to tune the diversity clustering penalty parameter for harmony integration. -
dimred/pca
: fixvariance
slot containing a second copy of the variance ratio matrix and not the variances.
-
mapping/cellranger_multi
: Fix an issue where using a directory as value for--input
would causeAttributeError
. -
workflows/integration
:init_pos
is no longer set to the integration layer (e.g.X_pca_integrated
).
-
integration
andfull
workflows: do not run harmony integration whenobs_covariates
is not provided. -
Add
highmem
label todimred/pca
component. -
Remove disabled
convert/from_csv_to_h5mu
component. -
Update to Viash 0.7.1.
-
Several components: update to scanpy 1.9.2
-
process_10xh5/filter_10xh5
: speed up build by usingeddelbuettel/r2u:22.04
base container.
dataflow/concat
: Renamed--compression
to--output_compression
.
- Removed
bin
folder. As of viash 0.6.4, a_viash.yaml
file can be included in the root of a repository to set common viash options for the project. These options were previously covered in thebin/init
script, but this new feature of viash makes its use unnecessary. Theviash
andnextlow
should now be installed in a directory that is included in your$PATH
.
filter/do_filter
: raise an error instead of printing a warning when providing a column forvar_filer
orobs_filter
that doesn't exist.
-
workflows/full_pipeline
: Fix setting .var output column for filter_with_hvg. -
Fix running
mapping/cellranger_multi
without passing all references. -
filter/filter_with_scrublet
: now setsuse_approx_neighbors
toFalse
to avoid usingannoy
because it fails on processors that are missing the AVX-512 instruction sets. -
workflows
: UpdatedWorkflowHelper
to newer version that allows applying defaults when calling a subworkflow from another workflow. -
Several components: pin matplotlib to <3.7 to fix scanpy compatibility (see scverse/scanpy#2411).
-
workflows
: fix a bug when running a subworkflow from a workflow would cause the parent config to be read instead of the subworklow config. -
correction/cellbender_remove_background
: Fix description of input for cellbender_remove_background. -
filter/do_filter
: resolved an issue where the .obs column instead of the .var column was being logged when filtering using the .var column. -
workflows/rna_singlesample
andworkflows/prot_singlesample
: Correctly set var and obs columns while filtering with counts. -
filter/do_filter
: removed the default input value forvar_filter
argument. -
workflows/full_pipeline
andworkflows/integration
: fix PCA not using highly variable genes filter.
-
workflows/full_pipeline
: addedfilter_with_hvg_obs_batch_key
argument for batched detection of highly variable genes. -
workflows/rna_multisample
: addedfilter_with_hvg_obs_batch_key
,filter_with_hvg_flavor
andfilter_with_hvg_n_top_genes
arguments. -
qc/calculate_qc_metrics
: Add basic statistics:pct_dropout
,num_zero_obs
,obs_mean
andtotal_counts
are added to .var.num_nonzero_vars
,pct_{var_qc_metrics}
,total_counts_{var_qc_metrics}
,pct_of_counts_in_top_{top_n_vars}_vars
andtotal_counts
are included in .obs -
workflows/multiomics/rna_multisample
andworkflows/multiomics/full_pipeline
: addqc/calculate_qc_metrics
component to workflow. -
workflows/multiomics/prot_singlesample
: Processing unimodal single-sample CITE-seq data. -
workflows/multiomics/rna_singlesample
andworkflows/multiomics/full_pipeline
: Add filtering arguments to pipeline.
-
convert/from_bdrhap_to_h5mu
: bump R version to 4.2. -
process_10xh5/filter_10xh5
: bump R version to 4.2. -
dataflow/concat
: include path of file in error message when reading a mudata file fails. -
mapping/cellranger_multi
: write cellranger console output to acellranger_multi.log
file.
-
mapping/htseq_count_to_h5mu
: Fix a bug where reading in the gtf file causedAttributeError
. -
dataflow/concat
: the--input_id
is no longer required when--mode
is notmove
. -
filter/filter_with_hvg
: does no longer try to use--varm_name
to set non-existant metadata when running with--flavor seurat_v3
, which was causingKeyError
. -
filter/filter_with_hvg
: Enforce thatn_top_genes
is set whenflavor
is set to 'seurat_v3'. -
filter/filter_with_hvg
: Improve error message when trying to use 'cell_ranger' asflavor
and passing unfiltered data. -
mapping/cellranger_multi
now appliesgex_chemistry
,gex_secondary_analysis
,gex_generate_bam
,gex_include_introns
andgex_expect_cells
.
-
mapping/multi_star
: A parallellized version of running STAR (and HTSeq). -
mapping/multi_star_to_h5mu
: Convert the output ofmulti_star
to a h5mu file.
-
filter/filter_with_counts
: Fix an issue where mitochrondrial genes were being detected in .var_names, which contain ENSAMBL IDs instead of gene symbols in the pipelines. Solution was to create a--var_gene_names
argument which allows selecting a .var column to check using a regex (--mitochondrial_gene_regex
). -
dataflow/concat
,report/mermaid
,transform/clr
: Don't forget to exit with code returned by pytest.
-
workflows/full_pipeline
: addfilter_with_hvg_var_output
argument. -
dimred/pca
: Add--overwrite
and--var_input
arguments. -
tranform/clr
: Perform CLR normalization on CITE-seq data. -
workflows/ingestion/cellranger_multi
: Run Cell Ranger multi and convert the output to .h5mu. -
filter/remove_modality
: Remove a single modality from a MuData file. -
mapping/star_align
: Align.fastq
files using STAR. -
mapping/star_align_v273a
: Align.fastq
files using STAR v2.7.3a. -
mapping/star_build_reference
: Create a STAR reference index. -
mapping/cellranger_multi
: Align fastq files using Cell Ranger multi. -
mapping/samtools_sort
: Sort and (optionally) index alignments. -
mapping/htseq_count
: Quantify gene expression for subsequent testing for differential expression. -
mapping/htseq_count_to_h5mu
: Convert one or more HTSeq outputs to a MuData file. -
Added from
convert/from_cellranger_multi_to_h5mu
component.
-
convert/from_velocyto_to_h5mu
: Moved tovelocity/velocyto_to_h5mu
. It also now accepts an optional--input_h5mu
argument, to allow directly reading the RNA velocity data into a.h5mu
file containing the other modalities. -
resources_test/cellranger_tiny_fastq
: Include RNA velocity computations as part of the script. -
mapping/cellranger_mkfastq
: remove --memory and --cpu arguments as (resource management is automatically provided by viash).
-
Several components: use
gzip
compression for writing .h5mu files. -
Default value for
obs_covariates
argument of full pipeline is nowsample_id
. -
Set the
tag
directive of all Nextflow components to '$id'.
-
Keep data for modalities that are not specifically enabled when running full pipeline.
-
Fix many components thanks to Viash 0.6.4, which causes errors to be thrown when input and output files are defined but not found.
-
reference/make_reference
: Input files changed fromtype: string
totype: file
to allow Nextflow to cache the input files fetched from URL. -
several components (except
from_h5ad_to_h5mu
): the--modality
arguments no longer accept multiple values. -
Remove outdated
resources_test_scripts
. -
convert/from_h5mu_to_seurat
: Disabled because MuDataSeurat is currently broken, see https://github.com/PMBio/MuDataSeurat/issues/9. -
integrate/harmony
: Disabled because it is currently not functioning and the alternative, harmonypy, is used in the workflows. -
dataflow/concat
: Renamed --sample_names to --input_id and moved the ability to add sample id and to join the sample ids with the observation names tometadata/add_id
-
Moved
dataflow/concat
,dataflow/merge
anddataflow/split_modalities
to a new namespace:dataflow
. -
Moved
workflows/conversion/conversion
toworkflows/ingestion/conversion
-
metadata/add_id
: Add an id to a column in .obs. Also allows joining the id to the .obs_names. -
workflows/ingestion/make_reference
: A generic component to build a transcriptomics reference into one of many formats. -
integrate/scvi
: Performs scvi integration. -
integrate/add_metadata
: Add a csv containing metadata to the .obs or .var field of a mudata file. -
DataflowHelper.nf
: AddedpassthroughMap
. Usage:include { passthroughMap as pmap } from "./DataflowHelper.nf" workflow { Channel.fromList([["id", [input: "foo"], "passthrough"]]) | pmap{ id, data -> [id, data + [arg: 10]] } }
Note that in the example above, using a regular
map
would result in an exception being thrown, that is, "Invalid method invocationcall
with arguments".A synonymous of doing this with a regular
map()
would be:workflow { Channel.fromList([["id", [input: "foo"], "passthrough"]]) | map{ tup -> def (id, data) = tup [id, data + [arg: 10]] + tup.drop(2) } }
-
correction/cellbender_remove_background
: Eliminating technical artifacts from high-throughput single-cell RNA sequencing data. -
workflows/ingestion/cellranger_postprocessing
: Add post-processing of h5mu files created from Cell Ranger data. -
annotate/popv
: Performs popular major vote cell typing on single cell sequence data.
-
workflows/utils/DataflowHelper.nf
: Added helper functionssetWorkflowArguments()
andgetWorkflowArguments()
to split the data field of a channel event into a hashmap. Example usage:| setWorkflowArguments( pca: [ "input": "input", "obsm_output": "obsm_pca" ] integration: [ "obs_covariates": "obs_covariates", "obsm_input": "obsm_pca" ] ) | getWorkflowArguments("pca") | pca | getWorkflowArguments("integration") | integration
-
mapping/cellranger_count
: Allow passing both directories as well as individual fastq.gz files as inputs. -
convert/from_10xh5_to_h5mu
: Allow reading in QC metrics, use gene ids as.obs_names
instead of gene symbols. -
workflows/conversion
: Update pipeline to use the latest practices and to get it to a working state.
-
dimred/umap
: Streamline UMAP parameters by adding--obsm_output
parameter to allow choosing the output.obsm
slot. -
workflows/multiomics/integration
: Added arguments for tuning the various output slots of the integration pipeline, namely--obsm_pca
,--obsm_integrated
,--uns_neighbors
,--obsp_neighbor_distances
,--obsp_neighbor_connectivities
,--obs_cluster
,--obsm_umap
. -
Switch to Viash 0.6.1.
-
filter/subset_h5mu
: Add--modality
argument, export to VDSL3, add unit test. -
dataflow/split_modalities
: Also output modality types in a separate csv.
-
convert/from_bd_to_10x_molecular_barcode_tags
: Replaced UTF8 characters with ASCII. OpenJDK 17 or lower might throw the following exception when trying to read a UTF8 file:java.nio.charset.MalformedInputException: Input length = 1
. -
dataflow/concat
: Overriding sample name in .obs no longer raisesAttributeError
. -
dataflow/concat
: Fix false positives when checking for conflicts in .obs and .var when using--mode move
.
Major redesign of the integration and multiomic workflows. Current list of workflows:
-
ingestion/bd_rhapsody
: A generic pipeline for running BD Rhapsody WTA or Targeted mapping, with support for AbSeq, VDJ and/or SMK. -
ingestion/cellranger_mapping
: A pipeline for running Cell Ranger mapping. -
ingestion/demux
: A generic pipeline for running bcl2fastq, bcl-convert or Cell Ranger mkfastq. -
multiomics/rna_singlesample
: Processing unimodal single-sample RNA transcriptomics data. -
multiomics/rna_multisample
: Processing unimodal multi-sample RNA transcriptomics data. -
multiomics/integration
: A pipeline for demultiplexing multimodal multi-sample RNA transcriptomics data. -
multiomics/full_pipeline
: A pipeline to analyse multiple multiomics samples.
- Many components: Renamed
.var["gene_ids"]
and.var["feature_types"]
to.var["gene_id"]
and.var["feature_type"]
.
-
convert/from_10xh5_to_h5ad
andconvert/from_bdrhap_to_h5ad
: Removed h5ad based components. -
mapping/bd_rhapsody_wta
andworkflows/ingestion/bd_rhapsody_wta
: Deprecated in favour for more genericmapping/bd_rhapsody
andworkflows/ingestion/bd_rhapsody
pipelines. -
convert/from_csv_to_h5mu
: Disable until it is needed again. -
dataflow/concat
: Deprecated"concat"
option for--other_axis_mode
.
-
graph/bbknn
: Batch balanced KNN. -
transform/scaling
: Scale data to unit variance and zero mean. -
mapping/bd_rhapsody
: Added generic component for running the BD Rhapsody WTA or Targeted analysis, with support for AbSeq, VDJ and/or SMK. -
integrate/harmony
andintegrate/harmonypy
: Run a Harmony integration analysis (R-based and Python-based, respectively). -
integrate/scanorama
: Use Scanorama to integrate different experiments. -
reference/make_reference
: Download a transcriptomics reference and preprocess it (adding ERCC spikeins and filtering with a regex). -
reference/build_bdrhap_reference
: Compile a reference into a STAR index in the format expected by BD Rhapsody.
-
workflows/ingestion/bd_rhapsody
: Added generic workflow for running the BD Rhapsody WTA or Targeted analysis, with support for AbSeq, VDJ and/or SMK. -
workflows/multiomics/full_pipeline
: Implement pipeline for processing multiple multiomics samples.
-
convert/from_bdrhap_to_h5mu
: Added support for being able to deal with WTA, Targeted, SMK, AbSeq and VDJ data. -
dataflow/concat
: Added"move"
option to--other_axis_mode
, which allows merging.obs
and.var
by only keeping elements of the matrices which are the same in each of the samples, moving the conflicting values to.varm
or.obsm
.
-
Multiple components: Update to anndata 0.8 with mudata 0.2.0. This means that the format of the
.h5mu
files have changed. -
multiomics/rna_singlesample
: Move transformation counts into layers instead of overwriting.X
. -
Updated to Viash 0.6.0.
-
velocity/velocyto
: Allow configuring memory and parallellisation. -
cluster/leiden
: Add--obsp_connectivities
parameter to allow choosing the output slot. -
workflows/multiomics/rna_singlesample
,workflows/multiomics/rna_multisample
andworkflows/multiomics/integration
: Allow choosing the output paths. -
neighbors/bbknn
andneighbors/find_neighbors
: Add parameters for choosing the input/output slots. -
dimred/pca
anddimred/umap
: Add parameters for choosing the input/output slots. -
dataflow/concat
: Optimize concat performance by adding multiprocessing and refactoring functions. -
workflows/multimodal_integration
: Addobs_covariates
argument to pipeline.
-
Several components: Revert using slim versions of containers because they do not provide the tools to run nextflow with trace capabilities.
-
dataflow/concat
: Fix an issue where joining boolean values causedTypeError
. -
workflows/multiomics/rna_multisample
,workflows/multiomics/rna_singlesample
andworkflows/multiomics/integration
: Use nextflow trace reporting when running integration tests.
workflows/ingestion/bd_rhapsody_wta
: use ':' as a seperator for multiple input files and fix integration tests.
- Several components: pin mudata and scanpy dependencies so that anndata version <0.8.0 is used.
-
convert/from_bdrhap_to_h5mu
: Merge one or more BD rhapsody outputs into an h5mu file. -
dataflow/split_modalities
: Split the modalities from a single .h5mu multimodal sample into seperate .h5mu files. -
dataflow/concat
: Combine data from multiple samples together.
-
mapping/bd_rhapsody_wta
: Update to BD Rhapsody 1.10.1. -
mapping/bd_rhapsody_wta
: Add parameters for overriding the minimum RAM & cores. Add--dryrun
parameter. -
Switch to Viash 0.5.14.
-
convert/from_bdrhap_to_h5mu
: Update to BD Rhapsody 1.10.1. -
resources_test/bdrhap_5kjrt
: Add subsampled BD rhapsody datasets to test pipeline with. -
resources_test/bdrhap_ref_gencodev40_chr1
: Add subsampled reference to test BD rhapsody pipeline with. -
dataflow/merge
: Merge several unimodal .h5mu files into one multimodal .h5mu file. -
Updated several python docker images to slim version.
-
mapping/cellranger_count_split
: update container from ubuntu focal to ubuntu jammy -
download/sync_test_resources
: update AWS cli tools from 2.7.11 to 2.7.12 by updating docker image -
download/download_file
: now uses bash container instead of python. -
mapping/bd_rhapsody_wta
: Use squashed docker image in which log4j issues are resolved.
-
workflows/utils/WorkflowHelper.nf
: Renamedutils.nf
toWorkflowHelper.nf
. -
workflows/utils/WorkflowHelper.nf
: Fix error message when required parameter is not specified. -
workflows/utils/WorkflowHelper.nf
: Added helper functions:readConfig
: Read a Viash config from a yaml file.viashChannel
: Create a channel from the Viash config and the params object.helpMessage
: Print a help message and exit.
-
mapping/bd_rhapsody_wta
: Update picard to 2.27.3.
-
convert/from_bdrhap_to_h5ad
: Deprecated in favour forconvert/from_bdrhap_to_h5mu
. -
convert/from_10xh5_to_h5ad
: Deprecated in favour forconvert/from_10xh5_to_h5mu
.
bin/port_from_czbiohub_utilities.sh
: Added helper script to import components and pipelines fromczbiohub/utilities
Imported components from czbiohub/utilities
:
-
demux/cellranger_mkfastq
: Demultiplex raw sequencing data. -
mapping/cellranger_count
: Align fastq files using Cell Ranger count. -
mapping/cellranger_count_split
: Split 10x Cell Ranger output directory into separate output fields.
Imported workflows from czbiohub/utilities
:
-
workflows/1_ingestion/cellranger
: Use Cell Ranger to preprocess 10x data. -
workflows/1_ingestion/cellranger_demux
: Use cellranger demux to demultiplex sequencing BCL output to FASTQ. -
workflows/1_ingestion/cellranger_mapping
: Use cellranger count to align 10x fastq files to a reference.
-
Fix
interactive/run_cirrocumulus
script raisingNotImplementedError
caused by usingMutData.var_names_make_unique()
on each modality instead of on the wholeMuData
object. -
Fix
transform/normalize_total
andinteractive/run_cirrocumulus
component build missing a hdf5 dependency. -
interactive/run_cellxgene
: Updated container to ubuntu:focal because it contains python3.6 but cellxgene dropped python3.6 support. -
mapping/bd_rhapsody_wta
: Set--parallel
to true by default. -
mapping/bd_rhapsody_wta
: Translate Bash script into Python. -
download/sync_test_resources
: Add--dryrun
,--quiet
, and--delete
arguments. -
convert/from_h5mu_to_seurat
: Useeddelbuettel/r2u:22.04
docker container in order to speed up builds by downloading precompiled R packages. -
mapping/cellranger_count
: Use 5Gb for testing (to adhere to github CI runner memory constraints). -
convert/from_bdrhap_to_h5ad
: change test data to output frommapping/bd_rhapsody_wta
after reducing the BD Rhapsody test data size. -
Various
config.vsh.yaml
s: Renamedvalues:
tochoices:
. -
download/download_file
andtransfer/publish
: Switch base container frombash:5.1
topython:3.10
. -
mapping/bd_rhapsody_wta
: Make sure procps is installed.
-
mapping/bd_rhapsody_wta
: Use a smaller test dataset to reduce test time and make sure that the Github Action runners do not run out of disk space. -
download/sync_test_resources
: Disable the use of the Amazon EC2 instance metadata service to make script work on Github Actions runners. -
convert/from_h5mu_to_seurat
: Fix unit test requiring Seurat by using native R functions to test the Seurat object instead. -
mapping/cellranger_count
andbcl_demus/cellranger_mkfastq
: cellranger uses the--parameter=value
formatting instead of--parameter value
to set command line arguments. -
mapping/cellranger_count
:--nosecondary
is no longer always applied. -
mapping/bd_rhapsody_wta
: Added workaround for bug in Viash 0.5.12 where triple single quotes are incorrectly escaped (viash-io/viash#139).
bcl_demux/cellranger_mkfastq
: Duplicate ofdemux/cellranger_mkfastq
.
- Add
tx_processing
pipeline with following components:filter_with_counts
filter_with_scrublet
filter_with_hvg
do_filter
normalize_total
regress_out
log1p
pca
find_neighbors
leiden
umap
- Added
from_10x_to_h5ad
anddownload_10x_dataset
components.
-
Workflow
bd_rhapsody_wta
: Minor change to workflow to allow for easy processing of multiple samples with a tsv. -
Component
bd_rhapsody_wta
: Added more parameters,--parallel
and--timestamps
. -
Added
pbmc_1k_protein_v3
as a test resource. -
Translate
bd_rhapsody_extracth5ad
from R into Python script. -
bd_rhapsody_wta
: Remove temporary directory after execution. -
files/make_params
: Implement unit tests (PR #505).
- Initial release containing only a
bd_rhapsody_wta
pipeline and corresponding components.