Write patch results for RR, DR, etc. when write_patch_results=True #172

rmjarvis · 2024-03-01T07:57:12Z

@JonLoveday suggested in #168 that when using write_patch_results=True for an NNCorrelation object that has run calculateXi with rr and possibly dr randoms, that TreeCorr write out all the patch results for them as well, so that proper patch-based covariances can be built when reading the file back in. This PR implements that and the corresponding thing for NNNCorrelations.

While I was at it, I implemented a couple of other I/O related things I'd wanted to do. The first is to add classmethods called from_file for all the Correlation objects. These let you directly make the Correlation object without having to know the relevant configuration of the object that wrote the file. To enable this, I added a few more parameters in the header, so it knows how to build the object when reading.

Additionally, I added a write_cov option to the write functions, which write the covariance matrix to the output file. This is also read back in when reading it, so you don't have to remake the covariance if the original object had already calculated it.

…ecessary

JonLoveday · 2024-03-14T15:03:33Z

Thanks for making this update, but I must say I am confused about what is now output. I have a simple test case of an NN auto-correlation function, utilising DD, DR, and RR counts, with 10 jackknife patches (same 10 patches for the data and random catalogues). I would expect to see an output file with the overall results in the the first HDU and the pair counts excluding each jackknife region in turn in 10 subsequent HDUs. Instead, I see extensions 2-68 with names main_pp_0 - main_pp_17, _rr, _rr_pp_0 - _rr_pp_18, _dr, _dr_pp_0 - _dr_pp_27. I assume these are the DD, RR, and DR counts respectively, but why are there different numbers of each? Also, looking at the pair counts in each extension, these seem to be the pair counts from each patch, rather than excluding each patch (they are much smaller than the overall counts).

Here is the output file:
z0.fits.zip

rmjarvis · 2024-03-14T15:18:20Z

Yes, they are the counts from each pair of patches. That's the underlying data from which all kinds of covariance matrices are calculated (jackknife, bootstrap, sample, etc.). The point is that you can now just let TreeCorr read in this file and compute the jackknife covariance.

corr = treecorr.Corr2.from_file(output_file_name)
cov = corr.estimate_cov('jackknife')

Or you can compute covariances of derived quantities etc. Everything that you could have done from the original object that wrote the file. Are you trying to do something with this that is not enabled by this interface?

JonLoveday · 2024-03-14T15:55:35Z

Thanks for the quick response, it's making more sense now. I've not come across pairwise use of jackknife patches before (traditionally, one computes the pair counts excluding each patch in turn). My patches are disjoint (see attached figure, galaxies on the left, randoms on the right, colour-coded by patch number. Maybe that explains the different numbers of dd, dr, rr, extensions?

What I am trying to do is to use clustering-based redshift inference to constrain the N(z) distribution of a photometric data set cross-correlated with a smaller number of galaxies with redshifts, see e.g. https://academic.oup.com/mnras/article/522/3/3693/7143786. N(z) depends on the ratio of an angular cross-corrrelation function over the square root of an auto-correlation function. In order to get reliable uncertainties on N(z) I would like to be able to calculate it separately excluding each jackknife region in turn. Does that sound feasible using treecorr? The alternative is to propagate the uncertainties on the two correlation functions, but I think that will be less robust given likely non-Gaussian errors.

rmjarvis · 2024-03-14T16:02:57Z

Sure. You can use build_cov_design_matrix to compute the correlation function for each jackknife selection.
cf. https://rmjarvis.github.io/TreeCorr/_build/html/correlation2.html#treecorr.Corr2.build_cov_design_matrix
That's the design matrix from which the usual jackknife covariance matrix is computed.

JonLoveday · 2024-03-14T16:33:19Z

Brilliant, thanks Mike!

rmjarvis added 10 commits February 28, 2024 11:10

Serialize rr,dr,rd patch results when write_patch_results=True

e2d0351

Have Correlation reprs only show non-default kwargs

72a8685

Include all necessary construction kwargs in output file params

8f58752

Add from_file classmethod for corr2 classes

b340ed7

Make optional parameters in reader.py kwarg-only

4731e42

Add from_file for Corr3 classes too

71c5d2c

Serialize 3pt rrr, drr, rdd patch results in output file

1c06b44

Add to CHANGELOG

cd1e783

Add write_cov option

e74b74a

Add to CHANGELOG

2c10ea0

rmjarvis added this to the Version 5.0 milestone Mar 1, 2024

rmjarvis linked an issue Mar 1, 2024 that may be closed by this pull request

write_patch_results #168

Closed

rmjarvis mentioned this pull request Mar 1, 2024

write_patch_results #168

Closed

rmjarvis added 4 commits March 1, 2024 08:40

Fix tutorial notebook

a2c585c

coverage

27a079f

coverage

c456709

Remove clean option to FitsReader.read_params, which isn't actually n…

d75334e

…ecessary

rmjarvis merged commit 01fa3bb into main Mar 1, 2024
11 checks passed

rmjarvis deleted the write_rr_results branch March 1, 2024 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write patch results for RR, DR, etc. when write_patch_results=True #172

Write patch results for RR, DR, etc. when write_patch_results=True #172

rmjarvis commented Mar 1, 2024

JonLoveday commented Mar 14, 2024

rmjarvis commented Mar 14, 2024

JonLoveday commented Mar 14, 2024

rmjarvis commented Mar 14, 2024

JonLoveday commented Mar 14, 2024

Write patch results for RR, DR, etc. when write_patch_results=True #172

Write patch results for RR, DR, etc. when write_patch_results=True #172

Conversation

rmjarvis commented Mar 1, 2024

JonLoveday commented Mar 14, 2024

rmjarvis commented Mar 14, 2024

JonLoveday commented Mar 14, 2024

rmjarvis commented Mar 14, 2024

JonLoveday commented Mar 14, 2024