Add specificity to operation doc

nsidc · Jul 22, 2024 · beec098 · beec098
1 parent e4cb83e
commit beec098
Showing 1 changed file with 88 additions and 45 deletions.
diff --git a/doc/operation.md b/doc/operation.md
@@ -1,5 +1,3 @@
-# Operation
-
 > [!WARNING]
 > This software requires a large amount of memory (min. 8GB), increasingly more as time
 > goes on, because the entire climatology is sometimes read in to memory. An error may
@@ -11,9 +9,21 @@
 > We should move to a tool like SQLite or XArray for managing this data without needing
 > so much memory.
 
+# Operation
+
 The one set of files that is required but not generated by this code are the 37h
 threshold binaries provided by Tom Mote. These are checked in to this repository.
 
+<details><summary>🛠️ _TODO_</summary>
+
+- [ ] Simpler commands/CLI, shouldn't have to know to set PYTHONPATH.
+- [ ] Simplify steps; either consolidate or express step order in code, e.g. in CLI
+      with clear order.
+- [ ] Convert from storage via picklefiles to NetCDF. See issue:
+      https://github.com/nsidc/Antarctica_Today/issues/19
+
+</details>
+
 > [!NOTE]
 > The path commands are for users working in a bash environment! For csh users, 
 > the commands are:
@@ -22,81 +32,106 @@ threshold binaries provided by Tom Mote. These are checked in to this repository
 > setenv PYTHONPATH /path/to/repository/Antarctica_Today
 > python antactica_today/program.py
 > ```
->
-> 🛠️ _TODO_
->
-> - [ ] Simpler commands/CLI, shouldn't have to know to set PYTHONPATH.
-> - [ ] Simplify steps; either consolidate or express step order in code, e.g. in CLI
->       with clear order.
-> - [ ] Convert from storage via picklefiles to NetCDF. See issue:
->       https://github.com/nsidc/Antarctica_Today/issues/19
 
 > [!IMPORTANT]
 > :bangbang: Steps must be performed in order :bangbang:
 
 
 ## 1. Download NSIDC-0080
 
-Download all NSIDC-0080 granules:
+Download NSIDC-0080 granules into the `Tb/` directory:
 
 ```bash
 PYTHONPATH=.
 python antarctica_today download-tb
 ```
 
-> [!NOTE]
-> For data before 2022-01-10, `NSIDC-0001` and `NSIDC-0007` are used. These previous
-> data have already been processed and are available as binary ".bin" files in the
-> `/data/daily_melt_bin_files` directory. All data newer than that date are calculated
-> from the `NSIDC-0080` v2 product (https://nsidc.org/data/nsidc-0080/versions/2), which
-> is in NetCDF format.
+> [!IMPORTANT]
+> For data before 2022-01-10, data have already been processed through step 2 and are
+> available as binary ".bin" files in this repo's `/data/daily_melt_bin_files`
+> directory. This data was generated from `NSIDC-0001` and `NSIDC-0007` datasets.
+>
+> Data newer than that date are calculated freshly from the `NSIDC-0080` v2 product
+> (https://nsidc.org/data/nsidc-0080/versions/2), which is in NetCDF format. This step
+> downloads that raw data.
 
 
 ## 2. Generate all the daily melt binary files
 
+Generate new data in `data/daily_melt_bin_files/`:
+
 ```bash
 PYTHONPATH=.
 python antarctica_today generate-daily-melt
 ```
+> [!IMPORTANT]
+> Binaries provided in this repo's `data/daily_melt_bin_files/` dir with pre-2016 dates
+> are already calibrated by Tom Mote and don't need to be generated. Remember from the
+> note in the previous step: pre-generated data goes through to 2022-01-10.
 
-> [!WARNING]
-> I receive a large number of warnings like:
->
-> ```
-> UserWarning: Warning: At least one NSIDC Tb file on date '20230909' is missing. Skipping
-> that date.
-> ```
->
-> Why?
 
-> [!NOTE]
-> Binaries provided in `data/daily_melt_bin_files` are already calibrated by Tom Mote
-> for pre-2016 dates. This generates new daily melt files from the NSIDC-0080 data
-> downloaded in the previous step.
+<details><summary>🛠️ _TODO_</summary>
+I receive a large number of warnings like:
+
+```
+UserWarning: Warning: At least one NSIDC Tb file on date '20230909' is missing. Skipping
+that date.
+```
+
+Why?
+</details>
 
 
 ## 3. Generate the database
 
-This software manages a database covering the full climatology in the form of a pickle
-file.
+This step creates, primarily, four pickle files:
+
+* `daily_cumulative_melt_averages.pickle`
+* `daily_melt_pixel_averages.pickle`
+* `database/v3_1979-present_gap_filled.pickle`
+* `database/v3_1979-present_raw.pickle`
+
+Additionally:
+
+* `.csv` files will be created in `database/` directory
+* `.tif` files will be created in `data/mean_climatology/` directory
+* `.tif` files will be created in `data/annual_*_geotifs` directories
 
-> [!NOTE]
-> This command may take up to tens of minutes.
->
-> 🛠️ _TODO_
-> 
-> - [ ] What does this command do? Create the pickle?
-> - [ ] Why is the next section called "Initializing"? Are there multiple pickle files?
->       Does each command initialize one? Can we combine them all into one command?
 
 ```bash
 PYTHONPATH=.
 python antarctica_today preprocess
 ```
 
+> [!NOTE]
+> This command may take up to tens of minutes.
+
+<details><summary>🛠️ _TODO_</summary>
+
+- [ ] Why is the next section called "Initializing"? Are there multiple pickle files?
+      Does each command initialize one? Can we combine them all into one command?
+- [ ] After this command, `git status` shows untracked files. Which should ignored?
+      Which should be committed?
+
+      Untracked:
+
+      ```
+      database/baseline_percentiles_1990-2020.csv
+      database/baseline_percentiles_1990-2020_gap_filled.csv
+      database/daily_melt_totals.csv
+      database/daily_melt_totals_gap_filled.csv
+      ```
+</details>
+
 
 ### Database initialization (?)
 
+
+<details><summary>🛠️ _TODO_</summary>
+Is this step necessary? It seems like new files aren't being created when this step is
+run.
+</details>
+
 Create the melt array picklefile, a file containing a 2d grid for each day:
 
 ```bash
@@ -118,16 +153,16 @@ python antarctica_today gap-filled-melt-picklefile
 
 ### Daily updates
 
+> [!WARNING]
+> All initialization steps above must be completed first.
+
 This step will download any new Tb data files from NSIDC since its last run, and generate new plots from the last day's data (for all of Antartica and for each individual region), including:
 1) A "daily melt" map of the most recent day's melt extent
 2) A "sum" map of that season's total melt days
 3) An "anomaly" map of that season's total melt days in comparison to baseline average values to-that-day-of-year
 4) A line plot of melt extent up do that date, compared to historical baseline averages.
 It will copy these plots into a sub-directory /plots/daily_plots_gathered/[date]/ for easy collection.
 
-> [!WARNING]
-> All initialization steps above must be completed first.
-
 ```bash
 PYTHONPATH=.
 python antarctica_today/update_data.py
@@ -136,10 +171,18 @@ python antarctica_today/update_data.py
 
 ## 4. Generate outputs (optional)
 
-This will go through the entire database and produce summary maps and plots for every year on record.
-Run the main CLI's `process` command.
+> [!NOTE]
+> This command may take up to tens of minutes.
+
+This will go through the entire database and produce summary maps and plots for every year on record in the `plots/` directory.
 
 ```bash
 PYTHONPATH=.
 python antarctica_today process
 ```
+
+<details><summary>🛠️ _TODO_</summary>
+
+- [ ] After this step, `git status` shows changed files. Should they be committed?
+
+</details>