Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The process was interrupted after running 7/16 samples #10

Open
lanathemoon opened this issue Jan 21, 2024 · 8 comments
Open

The process was interrupted after running 7/16 samples #10

lanathemoon opened this issue Jan 21, 2024 · 8 comments

Comments

@lanathemoon
Copy link

Hello! First of all, thank you very much for developing the algorithm!
Due to stability issues with the hard drive, my process was interrupted after running 7/16 samples. I would like to ask if the algorithm has a way to read and continue with the "tmd_out" of the 7 samples it has already output.

In addition, I ran SEVrecognizer on a 128GB memory computer(There was an OOM error on a 64G computer), and it takes about 18 hours to calculate a single sample. Is this a normal phenomenon?

Thank you very much!

@RuiqiaoHe
Copy link
Member

Yes, you only need run SEVtras for the remaining samples. After finished, you can used following code to integrate all samples:

import SEVtras
from SEVtras.main import sEV_aggregator
sEV_aggregator(out_path='the path you set to output in the previous step', name_list=['the sample name1 in your list', 'the sample name2 in your list', 'the sample nameN in your list'], max_M=1000, score_t=1e-15, threads=30, search_UMI=500, flag=0)

Regarding runtime, it usually takes dozen minutes with multiple processes. Could you please set more threads with predefine_threads to accelerate SEVtras?

@lanathemoon
Copy link
Author

Thank you for your prompt response! I've noticed that the default value for the predefine_threads parameter is -2, which I understand means 'nproc - 2'. My system's nproc is 32, so having a couple of extra threads probably won't make much of a difference, right? Do you have any other suggestions or advice you could offer? Thank you in advance!

@RuiqiaoHe
Copy link
Member

Since you used the default parameter for predefine_threads, the runtime is abnormal in my opinion. I need to further check the following two points: 1) whether you are running SEVtras in Linux environment, because that would hang for a long time in Windows or WSL; 2) whether you are using SEVtras to process 10X samples, because other methods with more cellular debris noise would increase the computation time.
Thanks for your testing!

@lanathemoon
Copy link
Author

  1. I am running this on a terminal in my virtual machine, 2) indeed I am using 10X samples. I am not quite certain about the cause at the moment, but I will further explore this matter. Thank you for your patience in responding! Additionally, I would like to inquire about the 'adata_cell' parameter in ESAI_calculator, with 'adata_cell_path="./tests/test_cell.h5ad"'. Where should I obtain the file for this parameter when using my own data? I noticed that it is provided in your test package. Thank you very much for your assistance!

@RuiqiaoHe
Copy link
Member

The long runtime may relate to the incompatibility of python package multiprocessing in the virtual machine, because I only test SEVtras in a naive Linux environment. I suggest that you can test a sample in naive Linux, and it may be much faster.
And the "adata_cell" is a matrix of cells obtained by preprocessing and clustering with scanpy or other tools. This can be obtained according to the conventional single-cell processing workflow from cellranger output, for example scanpy.

@lanathemoon
Copy link
Author

lanathemoon commented Jan 30, 2024

Dear professor RuiqiaoHe,

I am reaching out again for some guidance on generating Extended Data Fig. 7-a and d, specifically concerning the UMAP plots. I have attempted to use the sEV_SEVtras.h5ad file from the recognizer's output, which I believe includes only sEVs. However, the resulting plot did not appear as scattered as the ones presented in the article (see attached image 777f0818f215c33457bee65ed02a6ed). Additionally, I used the raw_SEVtras.h5ad file for another plot, and despite adjusting the resolution to 0.03, it resulted in over 200 clusters, which seems excessive (refer to image dfab39d331576e26166219cb27d3d84).

My concern is that the clusters are not as well-defined and appear more aggregated compared to the published figures. I am wondering if I should perform all the QC processes , which I did not previously.
And could you please advise which specific output file should be utilized to recreate the UMAP that would closely resemble the one in the publication( Extended Data Fig. 7-a and d)?
Any suggestions or recommendations would be greatly appreciated to improve the clarity and separation of my UMAP plots.

Thank you very much for your time and assistance.

Best regards,
Lana

@RuiqiaoHe
Copy link
Member

Could you please then run the command SEVtras.ESAI_calculator, and complete the Part II analysis of SEVtras? The output will include a pdf file, named SEVumap.pdf, which would embed sEVs and cells in a umap. The results may resemble to the published figure.

@TATABOX99
Copy link

sEV_aggregator(out_path='the path you set to output in the previous step', name_list=['the sample name1 in your list', 'the sample name2 in your list', 'the sample nameN in your list'], max_M=1000, score_t=1e-15, threads=30, search_UMI=500, flag=0)
尊敬的作者您好,我想请问一下这个代码中的后面这些参数的意义是什么呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants