Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA method for telluric corrections #1647

Merged
merged 91 commits into from
Dec 18, 2023
Merged

PCA method for telluric corrections #1647

merged 91 commits into from
Dec 18, 2023

Conversation

freddavies
Copy link
Collaborator

@freddavies freddavies commented Aug 8, 2023

Introduces a new method for computing telluric corrections that uses a PCA decomposition of a large suite of telluric absorption models. The specifics will (probably, someday) be written up in a paper, but long story short, I ran a latin hypercube of 2000 telluric models with varying atmospheric parameters (pressure, temperature, humidity, airmass, O2, O3, CO2) for the MaunaKea, Paranal, MountGraham, LasCampanas, Palomar, and Lick observatories, which span a wide range of latitudes and elevations. I then computed a PCA decomposition of the arcsinh of the atmospheric optical depth¹ from all 12000 models. The resulting "grid" of UV-to-NIR telluric spectra then consists of just 10 PCA components, with a filesize up to a thousand times smaller than the old telluric grids and with broad applicability to all observatories (as far as I can tell).

The corresponding "TellPCA" files have been uploaded to the telluric folder on the development suite drive, at varying spectral samplings corresponding to R=10000, 15000, and 25000; the re-sampling was done in a flux-conserving way prior to the PCA decomposition in arcsinh(tau) space. For testing on Keck/HIRES I have also included a beta version of an R=60000 model in the optical which is only derived from MaunaKea and Paranal models.

My testing thus far has shown that this method provides comparably good telluric corrections to the old approach with a default of 4 PCA components used in the fit, with a similar runtime. The user can increase this number, given by a new parameter ntell, up to 10 if they desire a more accurate fit, though it usually makes little difference in practice (note that I have not tested this carefully at arbitrarily high S/N).

Due to its relative ease of use, this new PCA method has been set as the default telluric method. The original grid-based method can still be used by setting teltype = grid in the telluric parameter file, both to maintain reproducibility and to allow users to still use "actual" telluric absorption spectra if they are not comfortable with the pseudo-telluric spectra generated by the PCA method.

While I have done some testing on data from various instruments, before merging I think we need to do a more comprehensive comparison between the two methods to make sure that we are not compromising the quality of the telluric corrections. We could also have a meta-discussion as to whether there are unforeseen risks to using telluric corrections that are not strictly physical models with a particular set of parameters...



¹This may seem needlessly convoluted, but I swear there is a good reason! A decomposition in transmission space inevitably involves trimming at 0 and 1, where the former gives rise to nasty infinities in the correction. Optical depth space would otherwise work, except that the PCA decomposition puts too much weight in the highest optical depth pixels which have transmissions of ~0. Taking the log of the optical depth, conversely, puts way too much emphasis on regions with extremely low opacity where the transmission is ~1. It turns out that the arcsinh of the optical depth is a good middle ground, reducing the dynamic range at high values while maintaining linearity at small values.

@rcooke-ast rcooke-ast self-requested a review September 26, 2023 05:50
Copy link
Collaborator

@rcooke-ast rcooke-ast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @freddavies for the comprehensive overhaul of the telluric correction! I have mostly minor comments, but something that is unclear to me is why you need several different resolution grids? Can't we just have a single very high resolution version and then convolve it to lower resolutions specific to the instrument (and/or the resolution of the data at a given wavelength)?

Since my comments are relatively minor, I'll approve for now. It would be nice to see some plots of the telluric correction applied to the data you've tested (for both the grid and PCA method for comparison, and to ensure that both options still work). Also, it would be a good idea to double check that the tests pass.

CHANGES.rst Outdated
@@ -52,7 +52,7 @@
- Added a function ``check_spectrograph()`` (currently only defined for LRIS),
that checks (during ``pypeit_setup``) if the selected spectrograph is the
corrected one for the data used.

- Introduced PCA method for telluric corrections
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subsequent to this PR, the use of CHANGES has been deprecated, so I'll make a quick note of that here...

pypeit/par/pypeitpar.py Show resolved Hide resolved
pypeit/par/pypeitpar.py Outdated Show resolved Hide resolved
pypeit/core/telluric.py Outdated Show resolved Hide resolved
pypeit/core/telluric.py Outdated Show resolved Hide resolved
self.tell_dict = read_telluric_grid(self.telgrid, wave_min=self.wave_in_arr[wv_gpm].min(),
wave_max=self.wave_in_arr[wv_gpm].max())
else:
msgs.error('Invalid teltype -- must be `pca` or `grid`')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before.

@@ -2263,8 +2447,23 @@ def __init__(self, wave, flux, ivar, gpm, telgridfile, obj_params, init_obj_mode
wave, flux, ivar, gpm)
# 3) Read the telluric grid and initalize associated parameters
wv_gpm = self.wave_in_arr > 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be handled in the pypeitpar?

pypeit/core/telluric.py Show resolved Hide resolved
pypeit/core/telluric.py Show resolved Hide resolved
self.tell_dict['h2o_grid'].max()))
bounds.append((self.tell_dict['airmass_grid'].min(),
self.tell_dict['airmass_grid'].max()))
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment saying that this is the PCA approach within the else statement?

@rcooke-ast rcooke-ast mentioned this pull request Oct 8, 2023
@freddavies
Copy link
Collaborator Author

Thanks for the comments, @rcooke-ast ! I'm finally getting around to pushing this through...

First, a couple of updates: I have added new PCA models to the google drive with R=120000 in the optical for HIRES, and one with R=60000 for 0.9-5.5 µm for high-resolution and long-wavelength modes of NIRSPEC.

Thanks @freddavies for the comprehensive overhaul of the telluric correction! I have mostly minor comments, but something that is unclear to me is why you need several different resolution grids? Can't we just have a single very high resolution version and then convolve it to lower resolutions specific to the instrument (and/or the resolution of the data at a given wavelength)?

The question is whether you want to do that convolution before or after the transformation into flux space. The problem with the former is the non-linear nature of the PCA vectors -- while one could arbitrarily convolve down a high resolution version, that convolution would not conserve flux, so it becomes much less likely that you could still get a reasonable representation of a telluric spectrum.

One can always convolve down after the transformation to flux, but that makes it run much more slowly, since you have to perform a bunch of operations on the high resolution grid. So I opted to make a handful of grids of different sizes to optimize the runtime (at least a little bit). It could probably be consolidated better, though, maybe into a single file which has all the different-sized grids.

Since my comments are relatively minor, I'll approve for now. It would be nice to see some plots of the telluric correction applied to the data you've tested (for both the grid and PCA method for comparison, and to ensure that both options still work). Also, it would be a good idea to double check that the tests pass.

Indeed, I'm working on this comparison, but I haven't had enough time to make a comprehensive and intelligible set of plots. I will post some figures in the developer Slack at a later date.

pypeit/par/pypeitpar.py Outdated Show resolved Hide resolved
pypeit/par/pypeitpar.py Outdated Show resolved Hide resolved
pypeit/par/pypeitpar.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jhennawi jhennawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go. Thanks for the amazing work here @freddavies

@profxj
Copy link
Collaborator

profxj commented Dec 16, 2023

Test Summary

--- PYTEST PYPEIT UNIT TESTS PASSED 242 passed, 70 warnings in 358.78s (0:05:58) ---
--- PYTEST UNIT TESTS FAILED 2 failed, 131 passed, 171 warnings in 919.02s (0:15:19) ---
--- PYTEST VET TESTS PASSED 61 passed, 70 warnings in 1495.55s (0:24:55) ---
--- PYPEIT DEVELOPMENT SUITE PASSED 237/237 TESTS ---
Testing Started at 2023-12-13T23:39:00.775595
Testing Completed at 2023-12-14T09:24:33.737452
Total Time: 9:45:32.961857

@profxj profxj merged commit 972735a into develop Dec 18, 2023
23 checks passed
@rcooke-ast rcooke-ast deleted the telpca branch December 19, 2023 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants