-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest available lustre-utils not downloading #152
Comments
Gabriel; http://bcbio-nextgen.readthedocs.io/en/latest/contents/cloud.html#running-a-cluster Sorry to not have an immediate Lustre fix but hope this works for your needs. |
HI Brad, Yeah, I switched to using the NFS system immediately after the Lustre filesystem wasn't working for me, so I've got my analysis running anyway. Just wanted to flag up the fault in case someone else comes up against it again. I was wondering how much space in the shared filesystem does bcbio need when running? I'm running my analysis on a few hundred samples, so was concerned about how much space I needed to have on hand. At the moment, I've just provisioned a TB of NFS space, as I didn't want the cluster to run out of space mid analysis, but I suspect that was probably too much? Just wanted to know in the interest of keeping AWS costs down for future runs. Also, unrelated question, is there a way to know if my analysis on a cluster is still running ok and what stage it's at? On a single machine, I used to just rely on the log, but not quite sure what to do on SLURM. squeue and sacct_std tell me the processes are still running, but wanted to know if there was a way to check what step of the process it's at? Thanks! |
Glad the NFS option worked for you. We normally estimate ~3x the size of the bgzipped input fastq files for pipelines not involving recalibration/realignment (original fastqs, BAMs, associated files) and 4-5x for those that have recalibration and realignment. You should be able to track the progress of the run in your |
I've had a look at the slurm output file, and both log files, and there doesn't seem to be any progress on the log past the fourth hour of the run. I'm already at 1.5 days from when I started it, and am not entirely sure whether the analysis is progressing, or if something has stalled. my
as its last few entries. My
as its last few entries, and the
as its last few entries. There have been no new files added into my work folder, although This is probably just down to my inexperience with working with slurm processes, but I was wondering whether this indicates that my run is still working fine, or that something is wrong but the cluster is still going? What is the expected sequence of outputs on the log? My main concern is that since there are no new files being written to the work directory, nothing is actually going on with my analysis. It's just a bit worrisome as I've put on a fairly sizeable cluster to deal with the large number of data files I'm analysing, and would like to know whether I need to leave it running, or whether to terminate the cluster if it's no longer actually carrying on with the analysis properly? |
Sorry about the issue -- it does look like something is wrong with the run but it's hard to diagnose what is happening from what you have so far. It should have provide some kind of error or something else helpful, but since that didn't happen my suggestion is to:
It's pretty early in the process so you hopefully can identify the problem quickly on the re-run. Apologies about the issues. We're actively working on updating our AWS runs to use Common Workflow Language (http://bcbio-nextgen.readthedocs.io/en/latest/contents/cwl.html) so we can run with tools like Toil on AWS (http://toil.readthedocs.io/en/latest/) to improve debugging and resource usage. Hope this helps. |
I tried it again on a single node cluster, and the last error that was coming up was relating to mirdeep2 (I'm running a small rna analysis, which I realize I hadn't mentioned up to now). The error that it throws up is:
There's another error earlier in the run that refers to the same "excising precursors" step, but I wasn't able to copy it over in full. Haven't quite figured out how to copy the log folder out to my local computer or to s3, so the error messages I was able to copy are truncated. Sorry.
I've dealt with it in the meantime by just stripping out mirdeep2 from my analysis, and waiting to see if any errors come up with the remaining seqcluster and trna sections of the analysis. |
Hi, sorry about these errors. I don't see the full command in the last chunk of code you posted it. I don't know if it is mirdeep2 or not. The first one should be ok in the sense that it should continue even if you see that in the log. Any chance you can get the full *log files and send to me? or post it here? I will try to replicate that with a separate data and see if I can get more information. Let me know if skipping mirdeep2 helps in this case. cheers |
Hi, Sorry, I hadn't put that up in sequence, as I wasnt able to copy the truncated error message properly. The error I showed second actually came first in the log. It came up in between a set of The first error I showed in the previous message was the very last item on the log. At that point, there were no further additions to the log, even after several hours. I can try to recreate the error later, if that would help, once my current analysis run finishes. I just figured out how to copy files out of the cluster, so hopefully, I can get a full log out if I can recreate the problem. |
Hi,
I was trying to set up a lustre filesystem for an analysis on AWS, but the mount step was causing a fatal error and aborting the mount. The relevant error message that I get is as follows:
I had a quick look at the URL specified, and it seems to contain "lustre-utils_1.8.5+dfsg-3ubuntu1_amd64.deb" rather than "lustre-utils_1.8.5+dfsg-3.1ubuntu2_amd64.deb".
Best,
Gabriel
The text was updated successfully, but these errors were encountered: