From f70420ecb7ad2c7c30d9136145b89f0086274346 Mon Sep 17 00:00:00 2001
From: Max Jones <14077947+maxrjones@users.noreply.github.com>
Date: Mon, 11 Nov 2024 10:02:40 -0700
Subject: [PATCH] Remove Pangeo-Forge recipes notebook (#70)

---
 _toc.yml                                  |   1 -
 environment.yml                           |   5 +-
 notebooks/advanced/Pangeo_Forge.ipynb     | 230 ----------------------
 notebooks/using_references/Datatree.ipynb |   5 +-
 4 files changed, 3 insertions(+), 238 deletions(-)
 delete mode 100644 notebooks/advanced/Pangeo_Forge.ipynb

diff --git a/_toc.yml b/_toc.yml
index ea322130..f9b541d5 100644
--- a/_toc.yml
+++ b/_toc.yml
@@ -14,7 +14,6 @@ parts:
   - caption: Advanced
     chapters:
       - file: notebooks/advanced/Parquet_Reference_Storage
-      - file: notebooks/advanced/Pangeo_Forge
       - file: notebooks/advanced/appending
 
   - caption: Generating Reference Files
diff --git a/environment.yml b/environment.yml
index 57124f83..34e2d2b5 100644
--- a/environment.yml
+++ b/environment.yml
@@ -34,12 +34,9 @@ dependencies:
   - scipy
   - tifffile
   - ujson
-  - xarray
-  - xarray-datatree
+  - xarray>=2024.10.0
   - zarr
   - sphinx-pythia-theme
   - pip:
-      - "apache-beam[interactive, dataframe]"
-      - git+https://github.com/pangeo-forge/pangeo-forge-recipes
       - git+https://github.com/carbonplan/xrefcoord.git
       - git+https://github.com/fsspec/kerchunk
diff --git a/notebooks/advanced/Pangeo_Forge.ipynb b/notebooks/advanced/Pangeo_Forge.ipynb
deleted file mode 100644
index 4aafcbe7..00000000
--- a/notebooks/advanced/Pangeo_Forge.ipynb
+++ /dev/null
@@ -1,230 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Kerchunk and Pangeo-Forge\n",
-    "\n"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Overview\n",
-    "\n",
-    "In this tutorial we are going to use the open-source ETL pipeline named pangeo-forge-recipes to generate Kerchunk references.\n",
-    "\n",
-    "Pangeo-Forge is a community project to build reproducible cloud-native ARCO (Analysis-Ready-Cloud-Optimized) datasets. The Python library (`pangeo-forge-recipes`) is the ETL pipeline to process these datasets or \"recipes\". While a majority of the recipes convert a legacy format such as NetCDF to Zarr stores, `pangeo-forge-recipes` can also use Kerchunk under the hood to create reference recipes. \n",
-    "\n",
-    "It is important to note that `Kerchunk` can be used independently of `pangeo-forge-recipes` and in this example, `pangeo-forge-recipes` is acting as the runner for `Kerchunk`. \n",
-    "\n",
-    "\n",
-    "\n",
-    "## Prerequisites\n",
-    "| Concepts | Importance | Notes |\n",
-    "| --- | --- | --- |\n",
-    "| [Kerchunk Basics](../foundations/kerchunk_basics) | Required | Core |\n",
-    "| [Multiple Files and Kerchunk](../foundations/kerchunk_multi_file) | Required | Core |\n",
-    "| [Multi-File Datasets with Kerchunk](../case_studies/ARG_Weather.ipynb) | Required | IO/Visualization |\n",
-    "\n",
-    "- **Time to learn**: 45 minutes\n",
-    "---"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Getting to Know The Data\n",
-    "\n",
-    "`gridMET` is a high-resolution daily meteorological dataset covering CONUS from 1979-2023. It is produced by the Climatology Lab at UC Merced. In this example, we are going to look create a virtual Zarr dataset of a derived variable, Burn Index. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Examine a Single File"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import xarray as xr\n",
-    "\n",
-    "ds = xr.open_dataset(\n",
-    "    \"http://thredds.northwestknowledge.net:8080/thredds/dodsC/MET/bi/bi_2021.nc\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Plot the Dataset"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "ds.sel(day=\"2021-08-01\").burning_index_g.plot()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Create a File Pattern\n",
-    "\n",
-    "To build our `pangeo-forge` pipeline, we need to create a `FilePattern` object, which is composed of all of our input urls. This dataset ranges from 1979 through 2023 and is composed of one year per file. \n",
-    " \n",
-    "To speed up our example, we will `prune` our recipe to select the first two entries in the `FilePattern`"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from pangeo_forge_recipes.patterns import ConcatDim, FilePattern\n",
-    "\n",
-    "years = list(range(1979, 2022 + 1))\n",
-    "\n",
-    "\n",
-    "time_dim = ConcatDim(\"time\", keys=years)\n",
-    "\n",
-    "\n",
-    "def format_function(time):\n",
-    "    return f\"http://www.northwestknowledge.net/metdata/data/bi_{time}.nc\"\n",
-    "\n",
-    "\n",
-    "pattern = FilePattern(format_function, time_dim, file_type=\"netcdf4\")\n",
-    "\n",
-    "\n",
-    "pattern = pattern.prune()\n",
-    "\n",
-    "pattern"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Create a Location For Output\n",
-    "We write to local storage for this example, but the reference file could also be shared via cloud storage. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "target_root = \"references\"\n",
-    "store_name = \"Pangeo_Forge\""
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Build the Pangeo-Forge Beam Pipeline\n",
-    "\n",
-    "Next, we will chain together a bunch of methods to create a Pangeo-Forge - Apache Beam pipeline. \n",
-    "Processing steps are chained together with the pipe operator (`|`). Once the pipeline is built, it can be ran in the following cell. \n",
-    "\n",
-    "The steps are as follows:\n",
-    "1. Creates a starting collection of our input file patterns.\n",
-    "2. Passes those file_patterns to `OpenWithKerchunk`, which creates references of each file.\n",
-    "3. Combines the references files into a single reference file and write them with `WriteCombineReferences`\n",
-    "\n",
-    "Just like Kerchunk, you can specify the reference file type as either `.json` or `.parquet`.\n",
-    "\n",
-    "Note: You can add additional processing steps in this pipeline. \n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import apache_beam as beam\n",
-    "from pangeo_forge_recipes.transforms import OpenWithKerchunk, WriteCombinedReference\n",
-    "\n",
-    "transforms = (\n",
-    "    # Create a beam PCollection from our input file pattern\n",
-    "    beam.Create(pattern.items())\n",
-    "    # Open with Kerchunk and create references for each file\n",
-    "    | OpenWithKerchunk(file_type=pattern.file_type)\n",
-    "    # Use Kerchunk's `MultiZarrToZarr` functionality to combine and\n",
-    "    # then write references. Note: Setting the correct contact_dims\n",
-    "    # and identical_dims is important.\n",
-    "    | WriteCombinedReference(\n",
-    "        target_root=target_root,\n",
-    "        store_name=store_name,\n",
-    "        output_file_name=\"reference.json\",\n",
-    "        concat_dims=[\"day\"],\n",
-    "        identical_dims=[\"lat\", \"lon\", \"crs\"],\n",
-    "    )\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "%%time\n",
-    "\n",
-    "with beam.Pipeline() as p:\n",
-    "    p | transforms"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.11"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/notebooks/using_references/Datatree.ipynb b/notebooks/using_references/Datatree.ipynb
index ded31caa..67cec16b 100644
--- a/notebooks/using_references/Datatree.ipynb
+++ b/notebooks/using_references/Datatree.ipynb
@@ -15,8 +15,7 @@
    "source": [
     "## Overview\n",
     "\n",
-    "In this tutorial we are going to use a large collection of pre-generated `Kerchunk` reference files and open them with [xarray-datatree](https://xarray-datatree.readthedocs.io/en/latest/). This chapter is heavily inspired by [this blog post](https://medium.com/pangeo/easy-ipcc-part-1-multi-model-datatree-469b87cf9114).\n",
-    "\n",
+    "In this tutorial we are going to use a large collection of pre-generated `Kerchunk` reference files and open them with Xarray's new [DataTree](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) functionality. This chapter is heavily inspired by [this blog post](https://medium.com/pangeo/easy-ipcc-part-1-multi-model-datatree-469b87cf9114).\n",
     "\n",
     "\n",
     "### About the Dataset\n",
@@ -58,7 +57,7 @@
     "import hvplot.xarray  # noqa\n",
     "import pandas as pd\n",
     "import xarray as xr\n",
-    "from datatree import DataTree\n",
+    "from xarray import DataTree\n",
     "from distributed import Client\n",
     "from fsspec.implementations.reference import ReferenceFileSystem"
    ]