-
Notifications
You must be signed in to change notification settings - Fork 176
Accelsearch sigma
The primary pulsation searching code in presto is accelsearch. A drastically oversimplified description of this code is that it looks through the FFT of the input (dedispersed) time series searching for peaks. The final output from accelsearch is a list of periodicity candidates along with their statistical significances. In a typical search pipeline, these lists of candidates from all trial DMs will be combined and the result “sifted” to choose the best DM for each collection of detections corresponding to each candidate; then some number of candidates with the best statistical significances will be selected for folding. In some accelsearch-based pipelines we have found that the measure of statistical significance used to make this cut is not necessarily a good reflection of the candidate quality, so this page is intended to explain exactly how this number is obtained and what it means.
In practice there are several kinds of cleverness built into accelsearch beyond a simple search for peaks in an FFT:
- Zaplist. accelsearch can be supplied a zaplist, a list of frequency ranges which are simply not searched. In principle this could contain a list of all recurring RFI frequencies, but in practice it may only be a region around 50/60 Hz.
- Summing harmonics. Since many pulsars are not sinusoidal but instead have fairly narrow peaks, there is power in many harmonics; accelsearch can add up to 16 (?) harmonics incoherently.
- Interbinning. A naive algorithm loses a great deal of sensitivity when a pulsar signal falls between two Fourier frequencies. Careful interpolation in the Fourier domain can remove the problem.
- Acceleration searching. If a pulsar is in a tight binary, it can undergo significant period variations over the span of the observations. accelsearch uses clever matched filtering techniques to detect accelerated pulsars as well as non-accelerated pulsars.
- Red noise management. The FFTs to be searched often contain substantial contributions from broadband noise sources. If uncompensated, these noise sources would raise the background level in parts of the FFT and result in spurious periodicity detections. accelsearch avoids this by estimating the background level in the neighborhood of each harmonic.
- Candidate fine-tuning. The raw candidates from the FFT search are not necessarily at the best period. accelsearch attempts to “tune up” each candidate to improve its significance.
These all affect the candidate significance and selection process, but two in particular are important. Candidate fine-tuning turns accelsearch into a two-step process: the initial search uses one estimate of candidate quality, and the tuning-up process uses another, which is what is reported to the rest of the pipeline.
In the first step, initial detection, the background in each chunk of the FFT is estimated by taking the median of a chunk of the power spectrum. These chunks grow in width, logarithmically from 6-200 bins, as one goes to higher frequencies. (How wide are they? Do they overlap? Do they take zaplists into account?) The frequencies used to combine harmonics are harmonically related to within the interbinning resolution (?). Power is quantified in terms of, essentially, false positive probability, obtained from a chi-squared distribution depending on the number of harmonics and the power in each divided by the background for each. The strength of each harmonic is recorded in the accelsearch output file.
During candidate polishing, each harmonic is treated independently. The background is re-estimated (how?), and the frequency is adjusted (by how much?) using a more sophisticated interpolation scheme (what?) than interbinning. Note that the harmonics are not at this stage constrained to be harmonically related. The resulting harmonic powers are divided by the harmonic background estimates, and the fine-tuned frequency (based on which harmonic?) along with a significance level, are recorded in the accelsearch output file. It is these values that are used in sifting to determine which candidates to fold.