Skip to content

HISTORY Samplers

JulesKouatchou edited this page Jan 15, 2025 · 25 revisions

$\textcolor{red}{\textbf{Introduction}}$

An observing system simulation experiment (OSSE) is a modeling experiment used to evaluate the value of a new observing system when actual observational data are not available. An OSSE system includes a nature run (Atlas, 1997), a data assimilation system (Atlas et. al, 2015), and software to simulate “observations” from the nature run and to add realistic observation errors. OSSEs are designed to assess the impact of instruments that do not yet exist on numerical weather prediction (NWP) (Boukabara et. al, 2016) and analysis; to make design decisions for a new observing system or network; and to investigate the behavior of data assimilation systems and thereby optimally tune these systems in an environment where the “truth” and hence the system’s behavior is known.

Any OSSE activity starts with a realistic representation of nature, typically by means of a high-resolution simulation by a comprehensive Earth system model without assimilation, the so-called Nature Run (NR). These models are run for a period long enough to capture the relevant natural variability such as the seasonal cycle, and to spin up to a well equilibrated state. Any OSSE needs to have a procedure to extract synthetic observations that mimic the distribution of real observations, and the impacts of synthetic data should be equivalent to the corresponding impacts of real observations. The process of simulating the observations amount to sampling the NR at the appropriate times and locations.

NWP models used in OSSEs generate outputs across a grid system, essentially providing forecast information at specific points in space and time. If we want to obtain data at locations of interest (as seen by instruments), we can use offline techniques such as the Model Output Statistics (MOS) to statistically interpolate the model data to those locations. It is more attractive for models to have the capability to produce fields at any location and any frequency at runtime, instead of doing it offline.

In recent years, ESMF has incorporated robust parallel and scalable functionality for interpolation and regridding. This has allowed the GEOS model to be able to perform these tasks (interpolation and regridding) on the fly during the model integration. Initially, the GEOS model implemented the ability to read input data files and produce output files of different grid types and resolutions (horizontal and vertical). This work was expended with Sampler, a tool to generate data files at the user's prescribed locations (fixed or dynamic). Sampler is a HISTORY subcomponent that maps gridded model geophysical variables onto observation locations, be it fixed ground stations, aircraft trajectories or satellite swath.

With Sampler, we have the ability to configure the entire HISTORY pipeline to directly generate for any GEOS desired quantity at any static or time dependent location (or group of locations) of interest (stations, moving object trajectory, satellite swath, etc.).

In this document, we describe the different options for Sampler and explain how to use each of them while running the GEOS model.

$\textcolor{red}{\textbf{Types of samplers}}$

$\textcolor{blue}{\textbf{Station sampler}}$

Is used to produce geophysical variables at a set of time-independent geospatial coordinates corresponding to fixed ground stations (for instance NASA AERONET or NOAA GHCNd land surface stations).

$\textcolor{green}{\textbf{Station sampler: list of stations}}$

The user needs to create a csv file to list all the stations of interest. Each row should have at least the following information:

  • station name
  • station latitude
  • station longitude

The user may specify other parameters (such as the station ID) to add more description of a station as long as all the lines have the same number of columns. Currently, the code supports files with any of the following line contents:

station_id, station_name, station_longitude, station_latitude
station_name, station_id, station_longitude, station_latitude
station_name, station_longitude, station_latitude
station_name, station_latitude, station_longitude

Note

Since the most important parameters are the station name and its position, the source code will be refactored in the future so that the station file could include any number of columns as long as the key parameters are present in a consistent order.

Here is a sample station file:

List of stations from AERONET
name,lon,lat                                                                                                
Anchorage,-149.9,61.2
Atlanta,-84.4,33.7
Greenbelt,-76.9,39.1
Bismarck,-100.8,46.8

It obeys the line formatting:

station_name, station_longitude, station_latitude

$\textcolor{green}{\textbf{Station sampler: settings in HISTORY.rc}}$

The HISTORY.rc file settings for the station sampler follow the same syntax as described in the MAPL History Component document. However, specific parameters are required to be able to exercise the station sampler:

  • sampler_spec: A string that needs to be set to 'station' to select a station sampler collection.
  • station_id_file: Full path to the file containing the list of stations and their locations (latitude and longitude in degrees). _ station_skip_line: An integer specifying the numbers of lines to skip on top the station file.
  • regrid_method: A string specifying the regridding method (for instance 'BILINEAR', 'CONSERVATIVE') to be used to interpolate the model fields at the different stations.
  COLLECTIONS:                            
  Aeronet                                 
  ::                                                                                                                   
                                          
  Aeronet.sampler_spec: 'station'         
  Aeronet.station_id_file:   FULL_PATH/my_station_file.csv
  Aeronet.station_skip_line:  2           
  Aeronet.template: %y4%m2%d2_%h2%n2.nc4
  Aeronet.format: 'CFIO'                  
  Aeronet.frequency: 001000,  
  Aeronet.duration:  240000,   
  Aeronet.regrid_method:     'BILINEAR' ,
  Aeronet.fields: 'PHIS'       , 'AGCM'       , 'phis'       ,
                  'TROPT'      , 'AGCM'       ,    
                  'TS'         , 'SURFACE'    , 'ts'         , 
                  'TSOIL1'     , 'SURFACE'    ,   
                  'PS'         , 'DYN'        , 'ps'         ,    
                  'Q'          , 'MOIST'      , 'sphu'       ,
::

$\textcolor{blue}{\textbf{Trajectory sampler}}$

The trajectory sampler is used to produce any geophysical variables at time-dependent geospatial specific points along a defined path or trajectory through the atmosphere (corresponding to tracks of aircraft, balloons, ships or nadir-viewing spaceborne assets). The goal is to provide a snapshot of atmospheric conditions as an object would experience them while moving through that path.

To exercise the trajectory sampler, the user need to provide in the HISTORY.rc file at least the following information:

  • A list of names of the trajectories to be considered for outputs.
  • The date/time range to produce outputs along trajectories.
    • The range is specified through two parameters (beginning and end) in the format YYYY-MM-DDThh:mm:ss.
    • The experiment needs to start within that range, otherwise the code will abort.
    • The outputs along a trajectory will only be written out within the range through the simulation may proceed.
  • The frequency of the outputs.
  • For each trajectory:
    • A the full path to a netCDF file template.
      • The code will use the template to point to the actual netCDF file.
      • The netCDF file contains a list of specific geolocated points that the code will use for the trajectory sampler.
    • The list of fields to produce along the trajectory. The list is unique to the trajectory.

$\textcolor{green}{\textbf{Trajectory sampler: settings in HISTORY.rc}}$

To be able to use the trajectory sampler, it is important to set the following parameters in the HISTORY.rc file in the appropriate collection:

  • sampler_spec: A string that needs to be set to 'trajectory' to select a trajectory sampler collection.
  • ObsPlatforms: list of names (two consecutive names separated by a comma) of the different observation trajectories we want to produce outputs along.
  • obs_file_begin: date/time (in the format YYYY-MM-DDThh:mm:ss) for the beginning of the observation file. If not provided, the code will use the current date/time and will verify that a trajectory file exists on that specific date/time.
  • obs_file_interval: required parameter (in the format: yymmdd hhmmss) providing the date/time interval between two consecutive observation files.
  • obs_file_end: date/time (in the format YYYY-MM-DDThh:mm:ss) for the end of the observation file. If not provided, the code will use the current date/time plus 14 days.
  • Epoch: integer determining the output frequency in hours/minutes/seconds (in the format: hhmmss) .
  • regrid_method: A string specifying the regridding method (for instance 'BILINEAR', 'CONSERVATIVE') to be used to interpolate the model fields at the different stations.

It is not at the level of the trajectory collection that the fields to write out is listed. For each trajectory listed in the ObsPlatforms parameter, we need to provide additional settings to define at least the observation trajectory file template and the list of fields to produce along the defined trajectory. Assume that obs_traj is one value included in ObsPlatforms, here is a template setting for the corresponding trajectory:

PLATFORM.obs_traj::
  IODA_SCHEMA::
    index_name_x:     Location
    var_name_lon:     MetaData/longitude
    var_name_lat:     MetaData/latitude
    var_name_time:    MetaData/dateTime
    file_name_template:  FULL_PATH/obs_traj.%y4%m2%d2T%h2%n2%S2Z.nc4
  :: 
  GEOVALS_SCHEMA::
    geovals_fields::
      'PHIS'       , 'AGCM'       , 'phis'       ,
      'TROPT'      , 'AGCM'       ,    
      'TS'         , 'SURFACE'    , 'ts'         , 
      'TSOIL1'     , 'SURFACE'    ,   
      'PS'         , 'DYN'        , 'ps'         ,    
      'Q'          , 'MOIST'      , 'sphu'       ,
    ::
  ::
::

Here is a sample HISTORY.rc file:

  COLLECTIONS:                            
  'jedi'                                 
  ::                                                                                                                   
 
  jedi.sampler_spec:        trajectory                                                           
  jedi.ObsPlatforms:         aircraft atms_npp
  jedi.template:             '%y4%m2%d2_%h2%n2z.nc4',
  jedi.format:               'CFIO',
  jedi.obs_file_begin:       2019-07-31T21:00:00
  jedi.obs_file_interval:    '000000 060000'   
  jedi.obs_file_end:         2019-11-01T00:00:00
  jedi.Epoch:                060000          
  jedi.regrid_method:        'BILINEAR' ,
::  

#______ Format below does not obey the normal HISTORY.rc settings ____

DEFINE_OBS_PLATFORM::                                        

PLATFORM.aircraft::
  IODA_SCHEMA::
    index_name_x:     Location
    var_name_lon:     MetaData/longitude
    var_name_lat:     MetaData/latitude
    var_name_time:    MetaData/dateTime
    file_name_template:  /discover/nobackup/projects/gmao/aist-nr/data/ioda_reshuffle/%y4%m2%d2/geos_atmosphere/aircraft.%y4%m2%d2T%h2%n2%S2Z.nc4
  ::      
  GEOVALS_SCHEMA::
    geovals_fields::
      'PHIS'       , 'AGCM'       , 'phis'       ,
      'TROPT'      , 'AGCM'       ,    
      'TS'         , 'SURFACE'    , 'ts'         , 
    ::    
  ::      
:: 

PLATFORM.atms_npp::                                                                                IODA_SCHEMA::
    index_name_x:     Location
    var_name_lon:     MetaData/longitude
    var_name_lat:     MetaData/latitude
    var_name_time:    MetaData/dateTime
    file_name_template:  /discover/nobackup/projects/gmao/aist-nr/data/ioda_reshuffle/%y4%m2%d2/geos_atmosphere/atms_npp.%y4%m2%d2T%h2%n2%S2Z.nc4
  ::      
  GEOVALS_SCHEMA::
    geovals_fields::
      'TSOIL1'     , 'SURFACE'    ,   
      'PS'         , 'DYN'        , 'ps'         ,    
      'Q'          , 'MOIST'      , 'sphu'       ,
    ::    
  ::      
:: 

$\textcolor{blue}{\textbf{Swath sampler}}$

Are used to produce geophysical at time-dependent geospatial coordinates corresponding to the two-dimensional swath of an orbiting instrument. Swaths are typically represented by logically rectangular curvilinear grids that may have higher or lower resolution than the NR. When the swath has lower resolution than the NR, conservative regridding will be performed. However, in cases when the observing system has a much higher resolution than the NR, it maybe more advantageous to use masked samplers and perform any necessary interpolation offline.

$\textcolor{blue}{\textbf{Masked sampler}}$

Are used when the observing system has a much higher resolution than the NR. In this case, gridded geophysical variables are masked in such a way that values are preserved at those grid-points that have been visited by the satellite, with possibly the addition of a “halo” for aiding off-line interpolation, with all other grid-points receiving a constant undefined value. These gridded fields can be efficiently output using internal compression algorithms available with most modern formats (e.g., NetCDF-4, HDF-5), or alternatively using a sparse storage scheme.

$\textcolor{red}{\textbf{References}}$

  • Atlas, R., 1997: Atmospheric observations and experiments to assess their usefulness in data assimilation. J. Meteor. Soc. Japan, 75, 111–130, https://doi.org/10.2151/jmsj1965.75.1B_111.
  • Atlas, R., L. Bucci, B. Annane, R. Hoffman, and S. Murillo, 2015: Observing system simulation experiments to assess the potential impact of new observing systems on hurricane forecasting. Mar. Technol. Soc. J., 49, 140–148, https://doi.org/10.4031/MTSJ.49.6.3.
  • Boukabara, S. A., and Coauthors, 2016: Community Global Observing System Simulation Experiment (OSSE) Package (CGOP): Description and usage. J. Atmos. Oceanic Technol., 33, 1759–1777, https://doi.org/10.1175/JTECH-D-16-0012.1.
Clone this wiki locally