Skip to content

Latest commit

 

History

History
99 lines (68 loc) · 5.4 KB

README.md

File metadata and controls

99 lines (68 loc) · 5.4 KB

Slocum Python Glider Processing Toolbox (PGPT)

This is a minimal glider processing toolbox using Python to go from Slocum glider raw data files to self-describing *.nc files, that pass compliance checking with the US Integrated Ocean Observing System (US-IOOS) Glider Data Acquisition Centre (GDAC). The *.nc files from this toolbox should pass requirements to be ingested into the Global Telecomunications System (GTS) for further use in models. We follow the US IOOS guidelines for file format and structure of glider data.

1. Requirements

  1. bash, GNU parallel
  2. Python3
  3. pip packages:
    • PyYAML
    • Cerberus
    • pandas
    • numpy
    • xarray
    • gsw
    • dbdreader

For Mac users: This code uses Linux "date" function. Download the coreutils library brew install coreutils ; echo "alias date=gdate" >> ~/.bash_profile

2. Description

The intent of this toolbox is to produce a clean data set from raw glider data for sharing with data centres and for further careful scientific post-processing (expert processing), by preserving the original data resolution and associated metadata. This toolbox does not do enhanced checks for data Quality Control (QC) but performs some data flagging following guidelines for the Quality Assurance / Quality Control of Real Time Oceanographic Data (Quartod).

This toolbox seperately supports both realtime (while glider is deployed) and delayed data mode (after glider is recovered). The user can tell the toolbox which mode to use. The processing levels in both modes are the same, but delayed mode will contain the complete dataset while realtime may not.

Features

  • Delayed mode: filter and convert .[d|e]bd files into *.nc profiles and merge into a .nc timeseries file
  • Realtime mode: convert .[s|t]bd files into .nc profiles and merge into a .nc timeseries file
  • Processing features:
    • Preserve glider variables with original names under dimension [timestamp]
    • Calculate salinity
    • Apply salinity compensation to raw oxygen data ("sci_oxy4_oxygen" variable), if salinity and oxygen are present in the data
    • Convert NMEA to decimal degrees
    • Convert variables from radians to degrees
    • Apply a correction for longitude/latitude dead reckoning
    • Calculate the profile number for easy splitting of glider dives into profiles from the timeseries plot
  • During processing add metadata information from a prepared YAML file to provide complete record of glider data following US IOOS standards
  • Use naming convention of IOOS decoder for certain required variable names (time, lon,lat,temperature, salinity, oxygen, optical channels ...) to discover data files in GTS and ERDAP

3. How to Run

Clone this repository to your desired location on your machine. Follow the steps below and run the provided shell script to process your glider data. Modify the toolbox as needed. A working example is provided.

Directory Structure

Create a new directory called realtime or delayed, and another directory inside called raw and put all your binary .TBD|.SBD (if realtime data) or .DBD|.EBD (if delayed data) files in there. You can also put compressed Slocum binary files .?CD files in the raw directory; the script will automatically uncompress them.

Create a new directory called cache in the same path as the realtime or delayed directory and put the necessary cache files in there.

Create a YAML file including all the metadata needed for the netCDF files in the same path as the realtime or delayed and cache directories. You can use the metadata.yml file included in the exmple directory as a template and modify as needed.

At the end, your directory structure should look like this:

.
├── cache
│   ├── 00CDA96E.CAC
│   ├── 02A6E8E6.CAC
│   ├── 1A2BF75A.CAC
│   ├── 1BD4CF69.CAC
│   └── ...
├── delayed
│   └──	raw
│      ├── 02150054.DBD
│      ├── 02150054.EBD
│      ├── 02150055.DBD
│      ├── 02150055.EBD
│      ├── 02150056.DBD
│      ├── 02150056.EBD
│      └── ...
└── metadata.yml

Processing Files

Once all files and directories are in place, execute the following command (specify absolute path to the run directory):

run.sh -g glider_name -d absolute_path_to_mission_directory -m metadata_yaml_filename -p realtime_or_delayed

For the included example:

run.sh -g unit_334 -d /home/User/Github/PGPT/example -m metadata.yml -p delayed

Once the run.sh script is done, there will be 1 new directory in the mission directory, nc, which includes the netCDF files for each profile in the format gliderName-fileNameXXXX_processingMode.nc and a trajectory file that combines all the profiles in the format gliderName_processingMode_trajectory.nc.

Once the toolbox has run, the user can set a shell script to upload the data to an FTP server or a GDAC. Similarly the user can setup a script to sync the directory with the glider remote server (e.g. SFMC) and re-run the toolbox whenever there is new data to be processed.

4. Docker Image

A docker image of the Glider Processing Toolbox is available in Docker Hub for convenience:

docker pull taimaz/pgpt:1.0.2

and then run:

docker run -it pgpt