Skip to content

Commit

Permalink
Merge pull request #1207 from cta-observatory/moralejo-datacheck-docs
Browse files Browse the repository at this point in the history
Update docs related to data checks
  • Loading branch information
rlopezcoto authored Jan 30, 2024
2 parents cd5f92f + e05ffe5 commit fac56f2
Show file tree
Hide file tree
Showing 4 changed files with 188 additions and 22 deletions.
67 changes: 55 additions & 12 deletions docs/lst_analysis_workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,17 @@ Files and configuration
The DL1 files to use obviously depend on the source you want to analyse.
Unless you have a good reason, the latest version of the DL1 files should be used.

Selection of the real-data DL1 files
------------------------------------

The selection of the DL1 files (run-wise, i.e. those produced by lstosa by merging the subrun-wise DL1 files) for a
specific analysis (i.e., a given source and time period) can be performed using the notebook
``cta_lstchain/notebooks/data_quality.ipynb``. The selection also takes into account the quality of the data, mostly in
terms of atmospheric conditions - evaluated using the rate of cosmic-ray showers as a proxy. Data taken under poor
conditions will not be included in the list of selected runs. Instructions and further details can be found inside the
notebook.


RF models
---------

Expand All @@ -62,26 +73,58 @@ The RF models are stored in the following directory:
``/fefs/aswg/data/models/...``


Tuning of DL1 files and RF models
---------------------------------
Tuning of MC DL1 files and RF models
------------------------------------

In case of high NSB in the data, it is possible to tune the DL1 files and the RF models to improve the performance of the analysis.
This is done by changing the `config` file of the RF models and producing new DL1 files and training new RF models.
To produce a config tuned to the data you want to analyse, you may run ``lstchain_tune_nsb`` function that will produce a ``tuned_nsb_config.json`` file.
The default MC production is generated with a level of noise in the images which corresponds to the level of diffuse
night-sky background ("NSB") in a "dark" field of view (i.e. for observations with moon below the horizon, at not-too-low
galactic latitudes and not affected by other sources of noise, like the zodiacal light). In general, observations of
**extragalactic** sources in dark conditions can be properly analyzed with the default MC (i.e. with the standard RF models).

To request a new production of RF models, you can open a pull-request on the lstmcpipe repository, producing a complete MC config, using:
The median of the standard deviation of the pixel charges recorded in interleaved pedestal events (in which a camera
image is recorded in absence of a physics trigger) is a good measure of the NSB level in a given data run. This is computed
by the data selection notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (see above). For data with an NSB level
significantly higher than the "dark field" one, it is possible to tune (increase) the noise in the MC files, and produce
from them RF models (and "test MC" for computing instrument response functions) which improve the performance of the
analysis (relative to using the default, low-NSB MC).

This is done by changing the configuration file for the MC processing, producing new DL1(b) files, and training new RF models.
To produce a config tuned to the data you want to analyse, you first have to obtain the standard analysis configuration
file (for MC) for the desired version of lstchain (= the version with which the real DL1 files you will use were produced).
This can be done with the script :py:obj:`~lstchain.scripts.lstchain_dump_config`:

.. code-block::
lstchain-dump-config --mc --update-with tuned_nsb_config.json --output-file PATH_TO_OUTPUT_FILE
lstchain_dump_config --mc --output-file standard_lstchain_config.json
Now you have to update the file with the parameters needed to increase the NSB level. For this you need a simtel.gz MC
file from the desired production (any will do, it can be either a gamma or a proton file), and a "typical" subrun DL1 file
from your **real data** sample. "Typical" means one in which the NSB level is close to the median value for the sample
to be analyzed. The data selection notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (see above) provides a list of
a few such subruns for your selected sample - you can use any of them. To update the config file you have to use the
script :py:obj:`~lstchain.scripts.lstchain_tune_nsb` , e.g. :

.. code-block::
lstchain_tune_nsb.py --config standard_lstchain_config.json \
--input-mc .../simtel_corsika_theta_6.000_az_180.000_run10.simtel.gz \
--input-data .../dl1_LST-1.Run10032.0069.h5 \
-o tuned_nsb_lstchain_config.json
To request a **new production of RF models**, you can open a pull-request on the lstmcpipe repository, providing
the .json configuration file produced following the steps above.


Keeping track of lstchain configurations
----------------------------------------

The lstchain configuration file used to process the simulations and produce the RF models of a given MC production is
provided in the lstmcpipe repository, as well as in the models directory.

lstchain config
---------------
It is important that the software version, and the configuration used for processing real data and MC are the same. For a
given lstchain version, this should be guaranteed by following the procedure above which makes use of
:py:obj:`~lstchain.scripts.lstchain_dump_config`.

The lstchain config used to produce the RF models of a production is provided in the lstmcpipe repository, as well as in the models directory.
It is a good idea to use the same config for the data analysis.
You can also produce a config using `lstchain-dump-config`.

DL3/IRF config files
--------------------
Expand Down
88 changes: 87 additions & 1 deletion docs/lstchain_api/datachecks/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,93 @@ Datachecks (`datachecks`)
Introduction
============

Module containing functions producing the LST datachecks. Currently reaching DL1 level.
Module containing functions for checking the quality of the LST data.

DL1 data checks
===============

Currently the checks are done at the DL1 level.
The DL1 datacheck files are produced by running the following scripts sequentially:

* :py:obj:`lstchain.scripts.lstchain_check_dl1`

This takes as input a DL1 file (including DL1a information, i.e. camera images & times) from a data subrun, e.g.:

.. code-block:: bash
lstchain_check_dl1 --input-file dl1_LST-1.1.Run14619.0000.h5 --output-dir OUTPUT_DIR --omit-pdf
The script produces a data check file for the subrun, ``datacheck_dl1_LST-1.Run14619.0000.h5`` which contains many
quantities that can be used to judge the quality of the data (see class :py:obj:`~lstchain.datachecks.containers.DL1DataCheckContainer`)

|
* :py:obj:`lstchain.scripts.lstchain_check_dl1`

The same script is run again, but now providing as input the **subrun-wise datacheck files** produced earlier (all subruns
of a given run must be provided). It also needs to know where the subrun-wise ``muons_LST-1.*.fits files`` (produced in
the R0 to DL1 analysis step) which contain the muon ring information are stored ("MUONS_DIR"):

.. code-block:: bash
lstchain_check_dl1 --input-file "datacheck_dl1_LST-1.Run14619.*.h5" --output-dir OUTPUT_DIR --muons-dir MUONS_DIR
The output is now a data check file for the whole run, ``datacheck_dl1_LST-1.Run14619.h5`` which contains all the
information from the subrun-wise files. The script also produces a .pdf file ``datacheck_dl1_LST-1.Run14619.pdf`` with
various plots of the quantities stored in the DL1DataCheckContainer objects, plus others obtained from the muon ring
analysis. Note that the muon ring information is not propagated to the run-wise datacheck files, it is just used for
the plotting.

|
* :py:obj:`lstchain.scripts.lstchain_longterm_dl1_check`

This script merges the run-wise datacheck files of (typically) one night, stored in INPUT_DIR, and produces **a
single .h5 file for the whole night** as output (e.g. ``DL1_datacheck_20230920.h5``). The "longterm" in the script
name is a bit of an overstatement - in principle it can be run over data from many nights, but the output html (see
below) becomes too heavy and some of the interactive features work poorly.

.. code-block:: bash
lstchain_longterm_dl1_check --input-dir INPUT_DIR --muons-dir MUONS_DIR --output-file DL1_datacheck_20230920.h5 --batch
The output .h5 file contains a (run-wise) summarized version of the information in the input files, including the muon
ring .fits files. It also creates an .html file (e.g. ``DL1_datacheck_20230920.html``) which can be opened with any
web browser and which contains various interactive plots which allow to make a quick check of the data of a night. See
an example of the .html file `here <https://www.lst1.iac.es/datacheck/dl1/v0.10/20230920/DL1_datacheck_20230920.html>`_
(password protected).

|
.. Heading below be replaced by this once script is merged! :py:obj:`lstchain.scripts.lstchain_cherenkov_transparency`
* ``lstchain_cherenkov_transparency``


This script analyzes the image intensity histograms (one per subrun) stored in the **run-wise** datacheck files (which
must exist in INPUT_DIR)

.. code-block:: bash
lstchain_cherenkov_transparency --update-datacheck-file DL1_datacheck_20230920.h5 --input-dir INPUT_DIR
The script updates the **night-wise** datacheck .h5 file ``DL1_datacheck_20230920.h5`` with a new table (with one entry
per subrun) containing parameters related to the image intensity spectra for cosmic ray events (i.e., a
Cherenkov-transparency - like approach, see e.g. https://arxiv.org/abs/1310.1639).




Using the datacheck files for selecting good-quality data
=========================================================

The night-wise datacheck .h5 files, ``DL1_datacheck_YYYYMMDD.h5`` can be used to select a subsample of good quality data
from a large sample of observations. The files are relatively light, 6 MB per night in average. A large sample of them
can be processed with the notebook ``cta_lstchain/notebooks/data_quality.ipynb`` (instructions can be found inside the
notebook)


Reference/API
=============
Expand Down
46 changes: 38 additions & 8 deletions lstchain/image/modifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,8 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename,
'/dl1/event/telescope/monitoring/calibration')
data_dl1_pedestal = read_table(data_dl1_filename,
'/dl1/event/telescope/monitoring/pedestal')
data_dl1_flatfield = read_table(data_dl1_filename,
'/dl1/event/telescope/monitoring/flatfield')
data_dl1_parameters = read_table(data_dl1_filename,
'/dl1/event/telescope/parameters/LST_LSTCam')
data_dl1_image = read_table(data_dl1_filename,
Expand All @@ -203,8 +205,9 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename,
# Locate pixels with HG declared unusable either in original calibration or
# in interleaved events:
bad_pixels = unusable[0][0] # original calibration
for tf in unusable[1:][0]: # calibrations with interleaveds
bad_pixels = np.logical_or(bad_pixels, tf)
if unusable.shape[0] > 1:
for tf in unusable[1:][0]: # calibrations with interleaveds
bad_pixels = np.logical_or(bad_pixels, tf)
good_pixels = ~bad_pixels

# First index: 1,2,... = values from interleaveds (0 is for original
Expand All @@ -214,6 +217,29 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename,

# HG adc to pe conversion factors from interleaved calibrations:
data_HG_dc_to_pe = data_dl1_calibration['dc_to_pe'][:, 0, :]

if data_dl1_flatfield['charge_mean'].shape[0] < 2:
logging.error('\nCould not find interleaved FF calibrations in '
'monitoring table!')
return None, None, None

if data_dl1_pedestal['charge_std'].shape[0] < 2 :
logging.error('\nCould not find interleaved pedestal info in '
'monitoring table!')
return None, None, None

# Mean HG charge in interleaved FF events, to spot possible issues:
data_HG_FF_mean = data_dl1_flatfield['charge_mean'][1:, 0, :]
dummy = []
# indices which connect each FF calculation to a given calibration:
calibration_id = data_dl1_flatfield['calibration_id'][1:]

for i, x in enumerate(data_HG_FF_mean[:, ]):
dummy.append(x * data_HG_dc_to_pe[calibration_id[i],])
dummy = np.array(dummy)
# Average for all interleaved calibrations (in case there are more than one)
data_HG_FF_mean_pe = np.mean(dummy, axis=0) # one value per pixel

# Pixel-wise pedestal standard deviation (for an unbiased extractor),
# in adc counts:
data_HG_ped_std = data_dl1_pedestal['charge_std'][1:, 0, :]
Expand All @@ -224,20 +250,24 @@ def calculate_noise_parameters(simtel_filename, data_dl1_filename,
for i, x in enumerate(data_HG_ped_std[:, ]):
dummy.append(x * data_HG_dc_to_pe[calibration_id[i],])
dummy = np.array(dummy)

# Average for all interleaved calibrations (in case there are more than one)
data_HG_ped_std_pe = np.mean(dummy, axis=0) # one value per pixel

# Identify noisy pixels, likely containing stars - we want to adjust MC to
# the average diffuse NSB across the camera
data_median_std_ped_pe = np.nanmedian(data_HG_ped_std_pe)
data_std_std_ped_pe = np.nanstd(data_HG_ped_std_pe)
log.info(f'Real data: median across camera of good pixels\' pedestal std '
data_median_std_ped_pe = np.nanmedian(data_HG_ped_std_pe[good_pixels])
data_std_std_ped_pe = np.nanstd(data_HG_ped_std_pe[good_pixels])
log.info('\nReal data:')
log.info(f' Number of bad pixels (from calibration): {bad_pixels.sum()}')
log.info(f' Median of FF pixel charge: '
f'{np.nanmedian(data_HG_FF_mean_pe[good_pixels]):.3f} p.e.')
log.info(f' Median across camera of good pixels\' pedestal std '
f'{data_median_std_ped_pe:.3f} p.e.')
brightness_limit = data_median_std_ped_pe + 3 * data_std_std_ped_pe
too_bright_pixels = (data_HG_ped_std_pe > brightness_limit)
log.info(f'Number of pixels beyond 3 std dev of median: '
f'{too_bright_pixels.sum()}, (above {brightness_limit:.2f} p.e.)')
log.info(f' Number of pixels beyond 3 std dev of median: '
f' {too_bright_pixels.sum()}, (above {brightness_limit:.2f} '
f'p.e.)')

ped_mask = data_dl1_parameters['event_type'] == 2
# The charges in the images below are obtained with the extractor for
Expand Down
9 changes: 8 additions & 1 deletion lstchain/scripts/lstchain_tune_nsb.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ def main():

a, b, c = calculate_noise_parameters(args.input_mc, args.input_data,
args.config)
if a is None:
logging.error('Could not compute NSB tuning parameters. Exiting!')
sys.exit(1)

dict_nsb = {"increase_nsb": True,
"extra_noise_in_dim_pixels": round(a, 3),
Expand All @@ -93,7 +96,11 @@ def main():

if args.output_file:
cfg = read_configuration_file(args.config)
cfg['image_modifier'].update(dict_nsb)
if 'image_modifier' in cfg:
cfg['image_modifier'].update(dict_nsb)
else:
cfg['image_modifier'] = dict_nsb

dump_config(cfg, args.output_file, overwrite=args.overwrite)


Expand Down

0 comments on commit fac56f2

Please sign in to comment.