-
Notifications
You must be signed in to change notification settings - Fork 354
PyCBC Live O4 development
Placeholder page to track ideas and proposals for PyCBC Live features we want in O4. Note that there is a PyCBC Live O3 development page as well, with open items that should be carried here. Here I am adopting a different format than the O3 page because I felt some items required more explanation than could fit in a table row.
The current example is quite minimal and does not test a variety of things that are used in a production search, notably:
- Different combinations of detectors.
- The state and DQ channels (#3261).
Who is interested? Tito
The O3 implementation of autogating still feels a bit clunky and should be more carefully characterized and checked. For example, what happens if a glitch is close to the boundary of the analysis chunk? What happens when the analysis chunk becomes much shorter (early-warning analysis)?
There are also potentially better ways to remove loud glitches, for example the inpainting method developed by IAS. Can one of these methods be used in low latency?
Who is interested? Stéphanie, Tito
Outcome:
- https://github.com/gwastro/pycbc/pull/4298
- Significantly improved with respect to O3
- More can always be done, so I am not marking this with a check mark just yet :)
As explained in the O3 paper, including relatively insensitive detectors in the analysis leads to an increase of the trials factor due to how the current statistic is organized. Can this be improved, for example by excluding detectors from the trials based on their local sensitivity?
It would also be useful to introduce something similar to the single-detector trigger fits used by the offline search, as the background rate varies quite a lot over the search space.
Finally, the p_value combination method described in the O3 paper might be more correctly done as a single combination of all p_values, instead of doing it iteratively for each detector. We should understand if this makes any difference.
Who is interested?
The latency of the analysis is currently sensitive to the number of observing detectors, because each MPI worker has to process all detectors. Can we improve this by only processing one detector per MPI worker?
Who is interested? Tito
This is being experimented on Tito's branch https://github.com/titodalcanton/pycbc/tree/live_parallelize_detectors.
Right now each MPI worker requires access to h(t), and does the same conditioning to it. This has led to issues when the h(t) availability or timing becomes inconsistent across the cluster nodes. Can we read and prepare h(t) on the root process and broadcast it to the workers, while maintaining the same latency?
Who is interested? Bhooshan
This has partially been done using O2 replay data and seems to work well, but it has to be looked at more carefully.
Who is interested? Barna, Arthur, Stephanie
Outcome:
- EW search has been running for many months both on replay and real data.
- There is certainly room for improvement, but as an initial test, this is done.
Can iDQ be used to improve the robustness of the search against glitches?
Who is interested? Max
Outcome:
- The answer to the question is yes!
- https://github.com/gwastro/pycbc/pull/4175
In O3, each upload was followed by a few immediate follow-up processes (for example adding plots and comments to GraceDB) which created noticeable spikes in the lag of the analysis. Can these operations be split off to separate threads in a nice way?
Who is interested? Xan
Outcome:
- The answer to the question is yes!
- https://github.com/gwastro/pycbc/pull/4187
What is the effect of the SNR optimization on the skymaps generated by BAYESTAR? Are there ways to improve the speed or accuracy of pycbc_optimize_snr
?
Who is interested? Pierre-Alexandre
Outcome:
- Pierre-Alexandre and Max did somewhat look at the first question using MDC results, though I think a more solid study could be done, so I am not giving this a check mark yet.
- Arthur, Tom and others did a bunch of work to improve the optimizer:
MPI has a number of little quirks and annoyances that make it somewhat inconvenient to use on the CIT cluster. Here is a (probably incomplete) list:
- Intel's MPI implementation appears to impose a barrier at each gather operation. OpenMPI does not.
- However, OpenMPI does not work at CIT because it does not like computers with multiple IP addresses on the same bonded network interface (see https://github.com/open-mpi/ompi/issues/5818 for discussion on that).
- There does not seem to be a way to do fault-tolerant gather operations: if a node dies, the whole analysis hangs and has to be manually killed. Not sure if this is just an mpi4py limitation, or a more general MPI issue.
- The analysis also hangs at startup if one of the nodes is dead, and has to be manually killed.
Is there a different way to organize the multiprocess/multinode operation and communication, possibly using Condor?
Who is interested? Tito
Also want to improve the 'semi-analytic' approximations for signal / noise distributions - a draft technical description is being worked on at this Overleaf link
Who is interested? T. Dent, A. Lundgren, …
Outcome: lots of work by Tom and Veronica, e.g.
- https://github.com/gwastro/pycbc/pull/3077
- https://github.com/gwastro/pycbc/pull/3149
- https://github.com/gwastro/pycbc/pull/4039
- https://github.com/gwastro/pycbc/pull/4278
- Can always improve things, but seems to work reasonably.
Based on the highest SNR (max likelihood) mass and spin point, use a coordinate scheme where the metric is flat or nearly flat to create an expected parameter error region to get parameter uncertainties for source classification, EM predictions etc
Who is interested? Tom, Veronica
We should be able to run an instance with the injection without storing a large set of frame files. There are various ways in which this can be done:
- Running 2 separate instances: with and without injections
- Run a separate set of processes with injection and used the correct (without injection) background on the fly.
Who is interested? Bhooshan