PCA method for telluric corrections #1647

freddavies · 2023-08-08T10:55:47Z

Introduces a new method for computing telluric corrections that uses a PCA decomposition of a large suite of telluric absorption models. The specifics will (probably, someday) be written up in a paper, but long story short, I ran a latin hypercube of 2000 telluric models with varying atmospheric parameters (pressure, temperature, humidity, airmass, O2, O3, CO2) for the MaunaKea, Paranal, MountGraham, LasCampanas, Palomar, and Lick observatories, which span a wide range of latitudes and elevations. I then computed a PCA decomposition of the arcsinh of the atmospheric optical depth¹ from all 12000 models. The resulting "grid" of UV-to-NIR telluric spectra then consists of just 10 PCA components, with a filesize up to a thousand times smaller than the old telluric grids and with broad applicability to all observatories (as far as I can tell).

The corresponding "TellPCA" files have been uploaded to the telluric folder on the development suite drive, at varying spectral samplings corresponding to R=10000, 15000, and 25000; the re-sampling was done in a flux-conserving way prior to the PCA decomposition in arcsinh(tau) space. For testing on Keck/HIRES I have also included a beta version of an R=60000 model in the optical which is only derived from MaunaKea and Paranal models.

My testing thus far has shown that this method provides comparably good telluric corrections to the old approach with a default of 4 PCA components used in the fit, with a similar runtime. The user can increase this number, given by a new parameter ntell, up to 10 if they desire a more accurate fit, though it usually makes little difference in practice (note that I have not tested this carefully at arbitrarily high S/N).

Due to its relative ease of use, this new PCA method has been set as the default telluric method. The original grid-based method can still be used by setting teltype = grid in the telluric parameter file, both to maintain reproducibility and to allow users to still use "actual" telluric absorption spectra if they are not comfortable with the pseudo-telluric spectra generated by the PCA method.

While I have done some testing on data from various instruments, before merging I think we need to do a more comprehensive comparison between the two methods to make sure that we are not compromising the quality of the telluric corrections. We could also have a meta-discussion as to whether there are unforeseen risks to using telluric corrections that are not strictly physical models with a particular set of parameters...

¹This may seem needlessly convoluted, but I swear there is a good reason! A decomposition in transmission space inevitably involves trimming at 0 and 1, where the former gives rise to nasty infinities in the correction. Optical depth space would otherwise work, except that the PCA decomposition puts too much weight in the highest optical depth pixels which have transmissions of ~0. Taking the log of the optical depth, conversely, puts way too much emphasis on regions with extremely low opacity where the transmission is ~1. It turns out that the arcsinh of the optical depth is a good middle ground, reducing the dynamic range at high values while maintaining linearity at small values.

# Conflicts: # pypeit/core/telluric.py

rcooke-ast

Thanks @freddavies for the comprehensive overhaul of the telluric correction! I have mostly minor comments, but something that is unclear to me is why you need several different resolution grids? Can't we just have a single very high resolution version and then convolve it to lower resolutions specific to the instrument (and/or the resolution of the data at a given wavelength)?

Since my comments are relatively minor, I'll approve for now. It would be nice to see some plots of the telluric correction applied to the data you've tested (for both the grid and PCA method for comparison, and to ensure that both options still work). Also, it would be a good idea to double check that the tests pass.

rcooke-ast · 2023-09-26T08:17:51Z

CHANGES.rst

@@ -52,7 +52,7 @@
    - Added a function ``check_spectrograph()`` (currently only defined for LRIS),
      that checks (during ``pypeit_setup``) if the selected spectrograph is the
      corrected one for the data used. 
-
+- Introduced PCA method for telluric corrections


Subsequent to this PR, the use of CHANGES has been deprecated, so I'll make a quick note of that here...

pypeit/par/pypeitpar.py

pypeit/core/telluric.py

rcooke-ast · 2023-09-26T08:49:34Z

pypeit/core/telluric.py

+            self.tell_dict = read_telluric_grid(self.telgrid, wave_min=self.wave_in_arr[wv_gpm].min(),
+                                                wave_max=self.wave_in_arr[wv_gpm].max())
+        else:
+            msgs.error('Invalid teltype -- must be `pca` or `grid`')


Same comment as before.

rcooke-ast · 2023-09-26T08:49:55Z

pypeit/core/telluric.py

@@ -2263,8 +2447,23 @@ def __init__(self, wave, flux, ivar, gpm, telgridfile, obj_params, init_obj_mode
                wave, flux, ivar, gpm)
        # 3) Read the telluric grid and initalize associated parameters
        wv_gpm = self.wave_in_arr > 1.0


Shouldn't this be handled in the pypeitpar?

pypeit/core/telluric.py

rcooke-ast · 2023-09-26T08:51:43Z

pypeit/core/telluric.py

+                           self.tell_dict['h2o_grid'].max()))
+            bounds.append((self.tell_dict['airmass_grid'].min(),
+                           self.tell_dict['airmass_grid'].max()))
+        else:


Add a comment saying that this is the PCA approach within the else statement?

freddavies · 2023-12-12T11:02:35Z

Thanks for the comments, @rcooke-ast ! I'm finally getting around to pushing this through...

First, a couple of updates: I have added new PCA models to the google drive with R=120000 in the optical for HIRES, and one with R=60000 for 0.9-5.5 µm for high-resolution and long-wavelength modes of NIRSPEC.

Thanks @freddavies for the comprehensive overhaul of the telluric correction! I have mostly minor comments, but something that is unclear to me is why you need several different resolution grids? Can't we just have a single very high resolution version and then convolve it to lower resolutions specific to the instrument (and/or the resolution of the data at a given wavelength)?

The question is whether you want to do that convolution before or after the transformation into flux space. The problem with the former is the non-linear nature of the PCA vectors -- while one could arbitrarily convolve down a high resolution version, that convolution would not conserve flux, so it becomes much less likely that you could still get a reasonable representation of a telluric spectrum.

One can always convolve down after the transformation to flux, but that makes it run much more slowly, since you have to perform a bunch of operations on the high resolution grid. So I opted to make a handful of grids of different sizes to optimize the runtime (at least a little bit). It could probably be consolidated better, though, maybe into a single file which has all the different-sized grids.

Since my comments are relatively minor, I'll approve for now. It would be nice to see some plots of the telluric correction applied to the data you've tested (for both the grid and PCA method for comparison, and to ensure that both options still work). Also, it would be a good idea to double check that the tests pass.

Indeed, I'm working on this comparison, but I haven't had enough time to make a comprehensive and intelligible set of plots. I will post some figures in the developer Slack at a later date.

pypeit/par/pypeitpar.py

jhennawi

This is good to go. Thanks for the amazing work here @freddavies

profxj · 2023-12-16T21:11:57Z

Test Summary

--- PYTEST PYPEIT UNIT TESTS PASSED 242 passed, 70 warnings in 358.78s (0:05:58) ---
--- PYTEST UNIT TESTS FAILED 2 failed, 131 passed, 171 warnings in 919.02s (0:15:19) ---
--- PYTEST VET TESTS PASSED 61 passed, 70 warnings in 1495.55s (0:24:55) ---
--- PYPEIT DEVELOPMENT SUITE PASSED 237/237 TESTS ---
Testing Started at 2023-12-13T23:39:00.775595
Testing Completed at 2023-12-14T09:24:33.737452
Total Time: 9:45:32.961857

lowercase pca

Frederick Davies and others added 30 commits November 5, 2021 13:39

in prep for telluric pca, make qso pca more explicit

a5cac28

adjusted qso pca redshift implementation to smooth out derivative

e58ab6f

reworking for telluric pca

2150bb3

reworking for telluric pca

3481727

minor fixes to get things running

8927feb

adding par for ntell

b3a4241

making ntell par actually go through

de16a0c

minor fix to initialization

6f48d74

par fix

422efcd

finished renaming pca_dict to be more specific

1b598ae

finished renaming pca_dict to be more specific

4f11a6d

reverting because interp was just way too slow...

3e98306

use ntell when computing sensfunc

b78d74e

Merge branch 'develop' into telpca

dfa7f0d

# Conflicts: # pypeit/core/telluric.py

Merge branch 'develop' into telpca

2ed292c

merging yet again

fa234e9

cleanup

010ff22

fixing merge conflicts

d9b4237

os not imported anymore

6b64d59

merge conflicts

315132d

merging again!

4cb3bb1

merged in develop

04ecc92

minor things

23136a4

back to asinh grid

02f61ee

documentation updates

ec8f65a

fixed wavelength truncation

4b6980b

fixed wavelength truncation

3c8457d

playing around with convex hull prior

2bda22a

merging

26600e8

merging

ced2a44

rcooke-ast self-requested a review September 26, 2023 05:50

rcooke-ast approved these changes Sep 26, 2023

View reviewed changes

added pix_shift_bounds to tellfit calls

f3c5de6

rcooke-ast mentioned this pull request Oct 8, 2023

Datacube core #1669

Merged

Frederick Davies added 5 commits October 17, 2023 08:31

CHANGES deprecated

a7f13fe

Merge branch 'develop' into telpca

f8ec915

Merge branch 'hires_redux' into telpca

6348d9d

ryan PR comments

c2a29fe

resln bounds edits

514b76c

Frederick Davies added 3 commits December 12, 2023 14:02

added to release doc

560e9de

one last xshooter optimization

b1372f1

much faster, and sort_telluric works again

cbaaa9f