Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions for integrating CITE-seq data #114

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

HelloWorldLTY
Copy link

Dear Authors,

Hi all, I have added functions to implement the functions of integrating gene expression information and protein information from CITE-seq data. I have offered two options to model protein data, either using 1. NBmixture or 2. Normal Distirbution after preprocessing with CLR. It seems that the later option works better. I also provide a method to create the gudiance graph between genes and proteins.

Sincerely,
Tianyu

Copy link
Collaborator

@Jeff1995 Jeff1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HelloWorldLTY, thank you for the contribution!

I have reviewed the changes and understood the new functionalities and approaches. Below are some further changes that need to be addressed before we can merge.

  1. The unit tests are failing because of a new argument init_fea_emb in scglue.models.sc.GraphEncoder.__init__(). It needs a default value of None, otherwise the standard GLUE model would stop working. Meanwhile, I suppose the forward function does not need this argument?
  2. The scglue.utils.generate_prot_guidance_graph() function should be moved to the scglue.genomics module, just like other guidance graph functions.
  3. The scglue.utils.clr() function is a duplicate of muon.prot.pp.clr, so I'd suggest removing this and use muon directly.
  4. The NBMixtureDataDecoder returns one of the NB components randomly per Bernoulli sample rather than a genuine NB mixture. While this might also work in practice, a cleaner approach would require defining an NB mixture distribution and implement a mixture log_prob. I suppose totalVI did implement one, but we may also achieve the same goal using torch.distributions.MixtureSameFamily and NegativeBinomial.
  5. The new functions need type hints and documentations in a style consistent with the current code base.

Please let me know if you'd like me to address any of these problems or if I've misunderstood some of the code. I would be glad to help!

@HelloWorldLTY
Copy link
Author

Hi, sorry for my late reply. I have made the modification based on your comments. For NBMixture model, I used categorical and nb distribution from torch to implement. Please let me know if you have other questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants