-
Notifications
You must be signed in to change notification settings - Fork 80
Outreach
This is a set of various updates in current xcms developments.
XCMS has supported parallel processing since 2008
in several processing functions that promise a linear speed-up
if run in parallel on multiple input files, like e.g findPeaks()
used in the xcmsSet()
function. The parallelism was controlled
by the nSlave
argument.
Several mechanisms were supported, the first one, the Message Passing Interface (MPI) is the most powerful, as it is the standard on big HPC cluster systems. MPI is (still) a wide-spread standard for message passing (i.e. it is covering more than just firing up a bunch of sub-tasks), and runs on single multi-core servers, but is also able to "glue" a whole HPC cluster into a seemingly single machine, and can be integrated with batch systems like e.g. Sun Grid Engine (SGE, which later evolved into the Oracle Grid Engine (OGE) and several other children). At one stage, we were able to use "nSlave=100" on such a setup!
Later, other backend packages like SNOW
and parallel
were added as well,
and tried in a fixed order, until one was found to be installed, which was not very flexible.
In 2012 Martin Morgan started the BiocParallel
package, to provide a common interface
to a number of different approaches for (massively) parallel execution. In the current
xcms3 development efforts, Johannes Rainer now improved the xcms parallel execution
to use the new interface. The benefit is that now you have much more control over the parallel
processing in XCMS.
some_code_snippets
We will deprecate the nSlave
argument in April 2017, and remove it in October 2017.
You might wonder why we jump from XCMS_1.51.X via 2.99.X straight to XCMS_3.0.0 in April 2017. The reason is that behind the scenes, there are some quite substantial improvements behind the scenes. First, the code is undergoing some re-factorisation, which means that functions change their names, some arguments change and R files are re-structured. Most of this is invisible to the end-user. During the re-organisation, Johannes also did a rigorous code-review and spotted issues, e.g. in the binning functions used for plotting and matchedFilter. Some of the binning functions also suffered from optimisations that balanced accuracy against speed and memory consumption. They were certainly important back in 2006, but now we can drop some of the optimisations in favour of consistent results. This implicates that with xcms3, you might not be able to fully replicate all numbers you obtained with some 1.X.Y version. Finally, xcms3 also paves the way for a new on-disk format for xcmsRaw files implemented in MSnbase by Laurent Gatto.
All of these developments are a good reason to bump the major version, and since there was a paper by Paul Benton XCMS2: Processing Tandem Mass Spectrometry Data.... But due to formatting limitations on PubMed, this XCMS2 always got changed to XCMS2, and to avoid confusion, we decided to go straight to XCMS3 !