Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spectra coaddition and cosmic masking #2377

Merged
merged 59 commits into from
Oct 7, 2024
Merged

Fix spectra coaddition and cosmic masking #2377

merged 59 commits into from
Oct 7, 2024

Conversation

segasai
Copy link
Contributor

@segasai segasai commented Sep 25, 2024

This is an a actual PR fixing #2372

@coveralls
Copy link

coveralls commented Sep 25, 2024

Coverage Status

coverage: 30.293% (+0.1%) from 30.164%
when pulling f3bc12b on resol_fix
into 48c1641 on main.

@sbailey
Copy link
Contributor

sbailey commented Sep 27, 2024

Thanks. I confirm that the original code is a bug, i.e. we agree on the math of what should be done and that the current code does not do that. Since this implementation is a little different than the #2372 I'd like to study it a bit more and run some test tiles to check for corner cases (e.g. entirely masked spectra).

Ideally I'd like to also add some unit tests that would have caught the original bug, e.g. that a "coaddition" of a single resolution matrix should be the same as its input, and verifying what we expect the impact to be of a masked pixel on 1 out of N exposures, and the same pixel masked on N out of N exposures.

In issue #2372 you documented the impact on chi2; I'd also like to study the impact on final redshifts to decide what to do about Kibo (e.g. a coadds+redshifts-only supplemental run). But that doesn't have to be a blocking factor for wrapping this up.

@segasai
Copy link
Contributor Author

segasai commented Sep 27, 2024

Thanks for looking into that.

As far as I tested, this implementation should give the same result as the previous one. I just felt it was a bit clearer. but for sure it's worth double checking.
I agree this bug also worth a regression test.
The simplest one is indeed with one masked pixel and one exposure.

FWIW these are my runs https://data.desi.lbl.gov/desi/users/koposov/coadd_test/output/ on random hpx in kibo using this patch. So the redshfits can be directly compared to kibo. (this used earlier version of the patch, but AFAICS it should be exactly equal to what was before).

@segasai
Copy link
Contributor Author

segasai commented Sep 27, 2024

I've added the test that trips the current desispec but passes after the PR. (coaddition of one spectrum with one masked pixel)

extracted spectrum S and true spectrum S = R * M  where R is
resolution matrix is preserved exactly after the coadd
@segasai
Copy link
Contributor Author

segasai commented Sep 27, 2024

I've added a final (IMO) test that verifies if one has 'true 'spectrum M, Observations with 'random' resolution matrices R_i, random covariance matrices and extracted spectra S_i (obeying S_i= R_i M), then after running coadd the relationship
S_coadded = R_coadded M still holds exactly.
This test also obviously fails with existing desispec.
With this test I am actually very confident that the new code is correct.

@sbailey
Copy link
Contributor

sbailey commented Sep 28, 2024

Thanks, this all looks good. In the spirit of not making a major change just before the weekend, I will wait until Monday to merge this, but I don't see any problems. In the meantime, two more tests to request to really drive this home:

  • coaddition with one of the exposures entirely masked
  • coaddition with a single edge pixel masked (instead of an interior pixel), i.e. test_coadd_single_masked with mpix=0.

I'm not anticipating problems with those.

Let's discuss at the Tuesday data telecon about implications for Kibo. I'm not worried from a quick spot check, but I'd like to quantify that and study some rrdetails chi2 vs. z curves.

and fix the bug with masked pixels uncovered by the test
@segasai
Copy link
Contributor Author

segasai commented Sep 28, 2024

I've added the tests. Actually one of the tests did uncover the issue with the code. See b2f71bb

While adding those, I thought I'd add a test of cosmic ray masking, but an intuitive test that I wrote completely fails currently. I.e. here I put a large cosmic in i-th pixel in i-th spectrum.
And I was expecting that after stacking I would get the same result as if I had one less spectrum (with all those 'cosmic-pixels removed), but this fails and the cosmic masking fails to identify the cosmics. I don't quite understand the logic of cosmic masking in coadd(), so I don't know if I'm missing something obvious or not. The cosmic test is below:

    def test_coadd_cosmic(self):
        """                                                                                                                                                               
        Test coadd with cosmic rays                                                                                                                                       
        """
        nspec, nwave = 1000, 30
        s1 = self._random_spectra(nspec, nwave)
        rng = np.random.default_rng(432)
        model0 = rng.uniform(0, 1, size=nwave)
        ivar0 = rng.uniform(size=(nspec, nwave))
        flux0 = model0 + 1./np.sqrt(ivar0) * rng.normal(size=(nspec,nwave))
        resol0 = rng.uniform(0, 1, size=s1.resolution_data['b'].shape)
        s1.flux['b'] = flux0 * 1
        s1.ivar['b'] = ivar0 * 1
        s1.resolution_data['b'] = resol0 * 1
        COSMIC_VAL = 1e9
        xgrid,ygrid =  np.meshgrid(np.arange(nwave), np.arange(nspec))
        mask = ygrid == (xgrid%nwave)
        # make i-th pixel in i-th spectrum a cosmic                                                                                                                       
        s1.flux['b'][mask] = COSMIC_VAL
        s1.fibermap['TARGETID'] = [10] * nspec
        coadd(s1, cosmics_nsig=5)

        # prepare nspec-1 spectra with those cosmic affected pixels excised.
        s2 = self._random_spectra(nspec-1, nwave)
        for i in range(nwave):
            s2.flux['b'][:, i] = flux0[~mask[:,i],i]*1
            s2.ivar['b'][:, i] = ivar0[~mask[:,i],i]*1
            s2.resolution_data['b'][:,:,i] = resol0[~mask[:,i],:,i]*1
        coadd(s2, cosmics_nsig=5)
        print(s1.flux['b'],s2.flux['b'])
        self.assertTrue(np.allclose(s1.flux['b'], s2.flux['b']))
        self.assertTrue(np.allclose(s1.ivar['b'], s2.ivar['b']))
        self.assertTrue(np.allclose(s1.resolution_data['b'],
                                    s2.resolution_data['b']))

@segasai
Copy link
Contributor Author

segasai commented Sep 29, 2024

Just a few more comments after looking at the code.

  • the coadd_cameras has the same issue as coadd() (it can be seen with the existing test, if I replace coadd() by coadd_cameras(). Since the logic there is a bit more complicated, the fix requires more thought. But also coadd_cameras is not really used for much I think, so it's less important.
  • Another thing that is a bit strange is the current behaviour when one pixel in all the spectra in the stack is masked, because the current code will then still compute the ivar, ignoring the mask
    tivar[i][bad] = np.sum(spectra.ivar[b][jj][:,bad],axis=0) # if all masked, keep original ivar
    One situation where that will produce strange results is when the same pixel was masked by cosmic logic in all exposures, then the stack will just ignore that (and it won't be reflected in the mask or ivar). I'm not quite sure it's intended behaviour (but it is also probably very rare occurrence). Also the logic here has implications for what's done with the resolution matrix for these pixels. With the new code the resolution matrix elements that have all zero weights in every exposure will be zero.

@moustakas
Copy link
Member

coadd_cameras is used extensively by fastspecfit to write out coadeed spectra and models and also by the GQP working group. @stephjuneau

@weaverba137
Copy link
Member

And @adambolton I think did a full coadd across cameras for iron to load into SPARCL. This may need to be reconsidered.

There was some divergence there already i.e.
gradivar=1./gradvar vs gradivar = (gradvar>0)/(gradvar+(gradvar==0))
@stephjuneau
Copy link
Contributor

Hi @segasai: confirming what @moustakas and @weaverba137 mentioned above that coadd_across_cameras is very important for GQP and beyond. Any change/streamlining should at least be backward compatible to avoid breaking workflows. Thanks!

@segasai
Copy link
Contributor Author

segasai commented Sep 29, 2024

@stephjuneau I wasn't talking about breaking any interfaces here. I was just pointing that the function seem to suffer from the same issue that is topic of this PR. But indeed I wasn't aware that this f-n is widely used outside the main pipeline.

py/desispec/coaddition.py Outdated Show resolved Hide resolved
When resolution matrices were added, the non-masked ivars were used.
(this is kinda irrelevant with the oncoming fix, but important first step)
@sbailey
Copy link
Contributor

sbailey commented Oct 3, 2024

My latest commits include two tests for issues that @segasai identified that we need to fix:

  • if a pixel is masked in every exposure, that should be propagated to the coadd even if it is masked for different reasons in each exposure (already fixed; @segasai coded the fix faster than I could code the test)
  • if two exposures have unlucky cosmics at the same wavelength, they should both be masked as long as there are N>2 additional exposures that can identify that it really is a cosmic outlier and not a narrow emission line (current code just masks the first one, not the second)

@segasai
Copy link
Contributor Author

segasai commented Oct 3, 2024

I've committed the masker that can deal with more than one cosmic per pixel. Currently it's doing essentially iterative sigma clipping. That passes the tests, but my only worry are variable objects. With the new code we continue clipping till the chi^2 becomes low enough or we have less than 3 pixels in the stack.
One could potentially stop earlier and not do more than 2 iterations of clipping.
Other than that, I think we dealt with every outstanding issue here.

UPDATE on variable objects -- looking at sv1/dark/5059 RRL with targetid=39628511287183022 with 30 observations:
INFO:coaddition.py:571:_mask_cosmics: masking 47 wavelengths in 30 spectra for targetid=39628511287183022
INFO:coaddition.py:574:_mask_cosmics: masking 6 wavelengths with more than 1 mask per pixel for targetid=39628511287183022
INFO:coaddition.py:571:_mask_cosmics: masking 38 wavelengths in 30 spectra for targetid=39628511287183022
INFO:coaddition.py:574:_mask_cosmics: masking 8 wavelengths with more than 1 mask per pixel for targetid=39628511287183022
INFO:coaddition.py:574:_mask_cosmics: masking 14 wavelengths with more than 1 mask per pixel for targetid=39628511287183022
It doesn't seem too bad, so maybe things can be kept as they are now.

add test for coadd_cameras bits
@segasai
Copy link
Contributor Author

segasai commented Oct 3, 2024

The impact of the cosmic ray fixes is particularly strong on the spectra with sharp gradients.
Here's comparison of old and new stacks. The old stacks tended to mark some of the edges as cosmics:
image

I assume the same thing is present with narrow emission lines

one when the comismic maskign would mask completely unrelated
pixels when one of the spectra in the set is fully masked
@segasai
Copy link
Contributor Author

segasai commented Oct 3, 2024

There was one more bug in the original implementation (and the new one) is that
if there was ever a fully blank spectrum in the stack with ivar=0, then the cosmic masking of any spectrum after that was incorrect, because of this skipping:

if np.sum(good)==0 :

When spectra are collected for masking, the ivar=0 ones are skipped, but when doing the masking, that 'off-by-N'
k=np.argmax(gradivar[:,bi]*deltagrad[:,bi]**2)
is ignored leading to masking of wrong pixels.
Encountered when running on sv1/dark/5059
The last commit fixes that and adds a test.

Add camera option to mask_cosmics for printing
Also put the check for the minimum number of spectra needed for cosmic masking
inside the function to avoid having it in two places
(motivated by desire of having all the lines of coadd()/coadd_cameras()
test-covered)
Move the chi2 threshold into a function
also get rid of harcoded value of 3 for min_cosmic rejection
@segasai segasai changed the title Fix coaddition of resolution matrices Fix spectra coaddition and cosmic masking Oct 3, 2024
@segasai
Copy link
Contributor Author

segasai commented Oct 3, 2024

Pixel differences for a Lya-forest object, showing impact of the cosmic masking there.
In basically all the pixels with non-zero difference bertween new/old the masking there was happening in the previous iteration of the code and does not happen now.

image

segasai and others added 9 commits October 3, 2024 23:11
deviate clearly more than prescribed by variance vector.
In that case I ignore the chi^2 stats and just try to
empirically estimate the threshold. It will never work
particularly well, but it is better than doing what we
are doing
make a much cleaner interface for mask_cosmics as it does not need to get
all the spectra + indices + mask
Now it will just accept the spectra that needs masking and it will return
the boolean mask of cosmics.
That will also make it easier to store that information in the future
Copy link
Contributor

@sbailey sbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! So that we don't make major changes going into a weekend, I'll wait until Monday to merge and deploy this.

@sbailey sbailey merged commit 79529f9 into main Oct 7, 2024
26 checks passed
@sbailey sbailey deleted the resol_fix branch October 7, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants