Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write patch results for RR, DR, etc. when write_patch_results=True #172

Merged
merged 14 commits into from
Mar 1, 2024

Conversation

rmjarvis
Copy link
Owner

@rmjarvis rmjarvis commented Mar 1, 2024

@JonLoveday suggested in #168 that when using write_patch_results=True for an NNCorrelation object that has run calculateXi with rr and possibly dr randoms, that TreeCorr write out all the patch results for them as well, so that proper patch-based covariances can be built when reading the file back in. This PR implements that and the corresponding thing for NNNCorrelations.

While I was at it, I implemented a couple of other I/O related things I'd wanted to do. The first is to add classmethods called from_file for all the Correlation objects. These let you directly make the Correlation object without having to know the relevant configuration of the object that wrote the file. To enable this, I added a few more parameters in the header, so it knows how to build the object when reading.

Additionally, I added a write_cov option to the write functions, which write the covariance matrix to the output file. This is also read back in when reading it, so you don't have to remake the covariance if the original object had already calculated it.

@rmjarvis rmjarvis added this to the Version 5.0 milestone Mar 1, 2024
@rmjarvis rmjarvis linked an issue Mar 1, 2024 that may be closed by this pull request
@rmjarvis rmjarvis mentioned this pull request Mar 1, 2024
@rmjarvis rmjarvis merged commit 01fa3bb into main Mar 1, 2024
11 checks passed
@rmjarvis rmjarvis deleted the write_rr_results branch March 1, 2024 19:44
@JonLoveday
Copy link

Thanks for making this update, but I must say I am confused about what is now output. I have a simple test case of an NN auto-correlation function, utilising DD, DR, and RR counts, with 10 jackknife patches (same 10 patches for the data and random catalogues). I would expect to see an output file with the overall results in the the first HDU and the pair counts excluding each jackknife region in turn in 10 subsequent HDUs. Instead, I see extensions 2-68 with names main_pp_0 - main_pp_17, _rr, _rr_pp_0 - _rr_pp_18, _dr, _dr_pp_0 - _dr_pp_27. I assume these are the DD, RR, and DR counts respectively, but why are there different numbers of each? Also, looking at the pair counts in each extension, these seem to be the pair counts from each patch, rather than excluding each patch (they are much smaller than the overall counts).

Here is the output file:
z0.fits.zip

@rmjarvis
Copy link
Owner Author

Yes, they are the counts from each pair of patches. That's the underlying data from which all kinds of covariance matrices are calculated (jackknife, bootstrap, sample, etc.). The point is that you can now just let TreeCorr read in this file and compute the jackknife covariance.

corr = treecorr.Corr2.from_file(output_file_name)
cov = corr.estimate_cov('jackknife')

Or you can compute covariances of derived quantities etc. Everything that you could have done from the original object that wrote the file. Are you trying to do something with this that is not enabled by this interface?

@JonLoveday
Copy link

Thanks for the quick response, it's making more sense now. I've not come across pairwise use of jackknife patches before (traditionally, one computes the pair counts excluding each patch in turn). My patches are disjoint (see attached figure, galaxies on the left, randoms on the right, colour-coded by patch number. Maybe that explains the different numbers of dd, dr, rr, extensions?

What I am trying to do is to use clustering-based redshift inference to constrain the N(z) distribution of a photometric data set cross-correlated with a smaller number of galaxies with redshifts, see e.g. https://academic.oup.com/mnras/article/522/3/3693/7143786. N(z) depends on the ratio of an angular cross-corrrelation function over the square root of an auto-correlation function. In order to get reliable uncertainties on N(z) I would like to be able to calculate it separately excluding each jackknife region in turn. Does that sound feasible using treecorr? The alternative is to propagate the uncertainties on the two correlation functions, but I think that will be less robust given likely non-Gaussian errors.
Figure_1

@rmjarvis
Copy link
Owner Author

Sure. You can use build_cov_design_matrix to compute the correlation function for each jackknife selection.
cf. https://rmjarvis.github.io/TreeCorr/_build/html/correlation2.html#treecorr.Corr2.build_cov_design_matrix
That's the design matrix from which the usual jackknife covariance matrix is computed.

@JonLoveday
Copy link

Brilliant, thanks Mike!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write_patch_results
2 participants