-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing the analysis from Nature Biotech paper #3
Comments
Hi @apredeus , Thank you for your kind words and for reaching out. To answer your first question, we don't run ChromHMM individually on each cell type. Instead, we ran ChromHMM in Secondly, the Anchor file is basically a tab-delimited file which is output of the function FindTransferAnchors with the following format: The anchor file summarizes which cells in the query dataset map to which cells in the reference dataset and what's the mapping score (the third column). We use the mapping score to quantitatively interpolate fragment counts from every scCUT&Tag-pro experiment into the CITE-seq defined reference, and barcodes like I hope this answers your questions, if not, feel free to reach out again. |
Hi Avi, Thank you very much for your clarifications, I know what to do now to try and reproduce the analysis. Last question: how did you decide on 12 states? The emission table seems to have 7 meaningful states and 5 near-empty ones - am I missing something here? Thanks again for your help! |
I am glad to hear that. Unfortunately, your last question is a bit tricky to answer as it's hard to know beforehand the number of hidden chromatin states. In my opinion, this is not specific to ChromHMM, it's an open problem, and there are different ways to heuristically solve the problem, for example, using AIC or BIC criteria, and even then it may lead to multiple near-empty ones as you mentioned. Generally, if we quantify the posterior probability of near-empty states, they are trivially low and don't make much sense overall anyway. We use the states which are meaningfully quantifiable into different functional states. Ideally, again, in my opinion, the user should be able to specify the relevant combination of marks instead of learning unsupervised chromatin states (I am working on that for the next version) but I'd imagine when ChromHMM was designed we were not aware of relevant combinatorics of histone modifications and motivated the idea. |
Thank you for the detailed and thoughtful reply! It is indeed a frustrating thing to try and guess the number of states. I worked with (bulk) ChromHMM in the past, and it's totally a guessing game with lots of starting at the browser and some known regulatory elements, etc. As for the relevant combinations, I think they were the original ENCODE support people, so they wanted to be as unbiased as possible, right? Few interesting and unusual states were discovered like that, but not much above the combinations that were already known. At any rate, big thanks for entertaining my questions! Have a great holiday break. |
Hi @k3yavi - I was wondering if it would be possible to share the code used to generate BED files from pseudobulks (is there peak calling involved?) and run ChromHMM on the data used in the paper? We're not quite sure we're doing everything correctly. I know it's probably a bunch of miscellaneous scripts in few languages but anything would be helpful. |
Yes, I have a related question here #4 (comment) |
Hi @apredeus and @Gavin-Lijy , Thank you for your patience, and I apologize for the late response during the interview season. @Gavin-Lijy I'll reply to you on the #4 (comment) directly. @apredeus Generating the BED files from pseudo bulks was quite straightforward. The general idea is to use the function
|
Hi,
Thank you for a really great tool and an exciting dataset! I'm trying to reproduce the analysis done in the Nature Biotech paper (https://www.nature.com/articles/s41587-022-01250-0). I've downloaded the processed data from Zenodo (https://zenodo.org/record/5504061), and also compiled
schrom
and successfully ran the small example provided with the code.Now, I'm trying to re-do the analysis done in the paper, and have few questions:
E2L4_AGCGTATCACAGTCCG
which I assume are scCut&Tag-pro barcodes - what am I missing? Also, what exactly is the numerical value in the 3rd column of the anchor file that's fed to scChromHMM?Thank you in advance - any help would be much appreciated!
All the best,
-- Alex
The text was updated successfully, but these errors were encountered: