Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semi-supervised fastMNN correction #49

Open
julien-roux opened this issue Mar 5, 2024 · 3 comments
Open

semi-supervised fastMNN correction #49

julien-roux opened this issue Mar 5, 2024 · 3 comments

Comments

@julien-roux
Copy link

(First, thanks Aaron for the development and maintenance of this awesome package!)

After reading this preprint, I was wondering if there would be the possibility for such a semi-supervised correction with fastMNN()?

For example filtering MNN pairs could be done based on the prior annotation of different batches, based on the labels inferred from a SingleR run, based on the matching clusters after a clusterMNN() run... What do you think?

@LTLA
Copy link
Owner

LTLA commented Mar 5, 2024

I'm trying to remember, but a long time ago, we might have had similar thoughts. It would be theoretically easy to implement; just restrict the MNN pair formation to cell populations with the same annotation across batches and proceed with the rest of the algorithm. I could see how this could improve correction performance by avoiding the formation of MNN pairs between the wrong populations.

In practice, this was less useful than it seemed. People don't usually come into the analysis with existing annotations for the individual batches, at least not for their own experimental data. After all, the whole point of the batch correction step is to get everything on the same coordinate system so that you only have to do clustering and annotation once; if we already had consistent labels for each batch, we would never need to compute corrected values for the rest of our analysis. Other than to generate artworks like UMAP/t-SNE, perhaps, but I don't think those have much scientific value.

I guess that this functionality might have some appeal for secondary analyses of published datasets that have already been annotated. However, this leads to another problem, which is the harmonization of labels across datasets from different authors. Some poor soul has to go through each combination of datasets and decide which labels match up between them; easy enough for the major cell types, but difficult for the more ambiguous subtypes that might have differing terminology/definitions across the community. Making a mistake here would encourage the formation of the wrong MNN pairs - and frankly, if you already know which cell types match up between datasets, you can probably just proceed with the rest of your meta-analysis without computing corrected values (artistic endeavors aside).

In the end, I must have decided against putting in this functionality. Nonetheless, batchelor still contains a vestige of this line of thought, in the form of the restrict= argument to some of the functions. This was put there when I thought cell controls were going to definitely be a thing; it restricts the MNN pair formation to the control subpopulation within each batch, thus encouraging more accurate correction by focusing on the controls that must be the same across batches. Nowadays I think it's fair to say that no one cared about cell controls and restrict= was not a helpful option.

@julien-roux
Copy link
Author

Yes I think you have fair points, thanks for your input!

@DarioS
Copy link

DarioS commented Sep 2, 2024

People don't usually come into the analysis with existing annotations for the individual batches

Julien could also annotate cells prior to batch effect correction by using SingleR's or scClassify's correlation approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants