Skip to content

Commit

Permalink
Merge branch 'release/2024.1.0' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcfarland committed Jan 18, 2024
2 parents 5393f84 + 1b81783 commit f1cb764
Show file tree
Hide file tree
Showing 9 changed files with 58 additions and 65 deletions.
2 changes: 1 addition & 1 deletion api/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
# Manually managing azure-functions-worker may cause unexpected issues

azure-functions==1.11.2
requests==2.28.1
requests==2.31.0
66 changes: 10 additions & 56 deletions docs/concepts/computing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

The core components of the Planetary Computer are the datasets and APIs for querying them. This document provides an overview of the various ways you can compute on data hosted by the Planetary Computer.

Regardless of how you compute on the data, to ensure maximum efficiency you should locate your compute as close to the data as possible. The Planetary Computer Data Catalog is hosted in Azure's **West Europe** region, so your compute should be there too.
Regardless of how you compute on the data, to ensure maximum efficiency you should locate your compute as close to the data as possible. Most of the Planetary Computer Data Catalog is hosted in Azure's **West Europe** region, so your compute should be there too.

## Use our JupyterHub

The [Planetary Computer Hub](https://planetarycomputer.microsoft.com/compute) is a [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) deployment in the West Europe Azure region. This is the easiest way to get started with computing on the Planetary Computer.
The [Planetary Computer Hub](https://planetarycomputer.microsoft.com/compute) is a [JupyterHub](https://jupyterhub.readthedocs.io/en/stable/) deployment in the West Europe Azure region. This is the easiest way to get started with computing on the Planetary Computer.
That said, the Planetary Computer Hub is focused mainly on convenience. We recommend it for prototypes and exploration, but production workloads should use one of options using your own compute detailed below.


```{note} You'll need to [request access](https://planetarycomputer.microsoft.com/account/request) to use the Planetary Computer Hub.
```
Expand Down Expand Up @@ -35,62 +37,14 @@ See [Scaling with Dask](../quickstarts/scale-with-dask.md) for an introduction t

See [Using VS Code](../overview/ui-vscode) for how to use Visual Studio Code as a user interface for the Planetary Computer's Compute.

## Use GitHub Codespaces

See [Use GitHub Codespaces](../overview/ui-codespaces) for how to use [GitHub Codespaces][codespaces] as a user interface and execution environment using data from the on the Planetary Computer catalog.

## Use our Dask Gateway

In this setup, you only use the Planetary Computer's scalable compute. You don't log into JupyterHub. Instead, your local machine drives the computation.
We recommend this approach for users who value, and are comfortable with, managing a local development environment. This setup requires a bit more care on your part: You need to ensure that the versions of libraries in your local environment are compatible with the versions running in Azure. While not required, we recommend using the container images published at [Microsoft/planetary-computer-containers](https://github.com/microsoft/planetary-computer-containers). The example here will create a local jupyterlab session using the `mcr.microsoft.com/planetary-computer/python` image.

### Request a token from JupyterHub

Visit <https://planetarycomputer.microsoft.com/compute/hub/token> to generate a token. You'll be required to authenticate to generate a token.

![JupyterHub Admin page to generate a token.](images/hub-token.png)

Substitute that token anywhere you see `<JUPYTERHUB_API_TOKEN>` below.

### Connect to the Gateway

Similar to before, we'll use `dask_gateway` to connect. Only now we need to provide the URLs explicitly. You can specify them in code, or as environment variables.

This next snippet starts up jupyterlab at `localhost:8888` using the `mcr.microsoft.com/planetary-computer/python` container. It also mounts the current
working directory as a volume, so you can access your local files.

```console
$ export JUPYTERHUB_API_TOKEN=<JUPYTERHUB_API_TOKEN> from above
$ docker run -it --rm \
-p 8888:8888 \
-e JUPYTERHUB_API_TOKEN=$JUPYTERHUB_API_TOKEN \
-e DASK_GATEWAY__AUTH__TYPE="jupyterhub" \
-e DASK_GATEWAY__CLUSTER__OPTIONS__IMAGE="mcr.microsoft.com/planetary-computer/python:latest" \
-e DASK_GATEWAY__ADDRESS="https://pccompute.westeurope.cloudapp.azure.com/compute/services/dask-gateway" \
-e DASK_GATEWAY__PROXY_ADDRESS="gateway://pccompute-dask.westeurope.cloudapp.azure.com:80" \
mcr.microsoft.com/planetary-computer/python:latest \
jupyter lab --no-browser --ip="0.0.0.0"
```

That will print out a URL you can follow to access your local jupyterlab. From there, you can

```python
>>> import dask_gateway
>>> gateway = dask_gateway.Gateway()
>>> cluster = gateway.new_cluster()
>>> client = cluster.get_client()
```
## Use your own compute

From here on, computations using Dask will take place on the cluster. When you `.compute()` a result and bring it back locally,
it will come to the Python process running on your local machine. Ideally the results returned locally are small enough that the
lower bandwidth between Azure and your local machine aren't a bottleneck.
The previous methods relied on compute provided by the Planetary Computer, which is a great way to get started with the Planetary Computer's APIs and Data.
For production workloads, we recommend deploying your own compute, which gives you more control over the hardware and software environment.

![Diagram showing Compute on Azure without JupyterHub](images/gateway-diagram.png)
### Using GitHub Codespaces

## Use your own compute

The previous two methods relied on compute provided by the Planetary Computer. If you have your own Azure resources you can use those to access the Planetary Computer's Datasets.
That said, make sure your resources are in the **West Europe** Azure Region. Putting your compute in the same region as the data is the most efficient way to do your computation.
See [Use GitHub Codespaces](../overview/ui-codespaces) for how to use [GitHub Codespaces][codespaces] as a user interface and execution environment using data from the on the Planetary Computer catalog.

### Using Azure Machine Learning

Expand Down Expand Up @@ -139,4 +93,4 @@ Like the previous setup, the Dask scheduler and workers are running in Azure nea

![Diagram showing compute with self-managed Dask cluster](images/cloudprovider-diagram.png)

[codespaces]: https://github.com/features/codespaces
[codespaces]: https://github.com/features/codespaces
6 changes: 4 additions & 2 deletions docs/concepts/hub-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,11 +215,13 @@ $ az group create --name pangeo --location westeurope

**Create an app registration**

To authenticate users, we'll create an Azure AD app registration in the Azure Portal following [these instructions](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app).
In this example, the *sign-in audience* will be **accounts in this organizational directory only**. This is appropriate when you are administering a Hub for other users within your Azure AD tenant. By default, all users with a directory will be able to log into your Hub. You can manage access using [Azure Active Directory groups](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-manage-groups).
To authenticate users, we'll create app registration for the Microsoft Identity Platform in the Azure Portal following [these instructions](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app).
In this example, the *sign-in audience* will be **accounts in this organizational directory only**. This is appropriate when you are administering a Hub for other users within your tenant. By default, all users with a directory will be able to log into your Hub. You can manage access using [Azure Active Directory groups](https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-manage-groups).

When creating a new app registration, you'll be asked for a redirect URI. This URI should match where your users will access the Hub. If your organization already has a DNS provider, use that. Alternatively, you can have Azure handle the DNS for your Hub service automatically, which is what we'll use in this guide. We're calling our cluster ``pangeo-hub`` and deploying it in West Europe, so the callback URL is ``https://pangeo-hub.westeurope.cloudapp.azure.com/hub/oauth_callback``. In general the pattern is ``https://<hub-name>.<azure-region>.cloudapp.azure.com/hub/oauth_callback``.

If you need to further customize the platform settings, do so under the "Web" platform. The JupyterHub server will be the web server in this context.

Finally, create a client secret to pass to JupyterHub: Under the *Manage* section, select *Certificates and Secrets* then *New client secret*. We'll use the ``Value`` later on.
You will also need the app registration's ``Client ID`` and ``Tenant ID``, which are available on the app registration's main page, under *Essentials*.

Expand Down
2 changes: 1 addition & 1 deletion docs/overview/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The <a href="dataset/group/noaa-cdr">NOAA Climate Data Records</a> (CDRs) datase

### Sentinel 3

The <a href="">Sentinel-3 Collections</a> offers comprehensive datasets delivering insights into land, ocean, and atmospheric changes over time. These datasets support various applications, such as environmental monitoring, disaster management, climate change, and policy development. The collection is generated using modern data analysis methods applied to historical and ongoing satellite data, enabling the identification of climate trends.
The <a href="dataset/group/sentinel-3">Sentinel-3 Collections</a> offers comprehensive datasets delivering insights into land, ocean, and atmospheric changes over time. These datasets support various applications, such as environmental monitoring, disaster management, climate change, and policy development. The collection is generated using modern data analysis methods applied to historical and ongoing satellite data, enabling the identification of climate trends.

```{image} images/changelog-dataset-sentinel-3.png
:height: 500
Expand Down
2 changes: 1 addition & 1 deletion etl/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ nbformat==5.1.3
numpydoc==1.1.0
pyyaml==5.4.1
requests
sphinx==3.5.4
sphinx==5.*
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "pc-datacatalog",
"version": "2023.2.0",
"version": "2024.1.0",
"private": true,
"proxy": "http://api:7071/",
"dependencies": {
Expand Down
27 changes: 25 additions & 2 deletions public/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
<div id="cookie-banner"></div>
<div id="root"></div>

<script src="https://js.monitor.azure.com/scripts/c/ms.analytics-web-3.min.js"></script>
<script src="https://js.monitor.azure.com/scripts/c/ms.analytics-web-4.min.js"></script>
<script src="https://consentdeliveryfd.azurefd.net/mscc/lib/v2/wcp-consent.js"></script>
<script type="text/javascript">
/*
Expand All @@ -49,6 +49,22 @@
Cookie descriptions: https://osgwiki.com/wiki/JSLLv4#Cookies_Set.2FRead_by_JSLL
*/

function checkThirdPartyAdsOptOutCookie() {
try {
const ThirdPartyAdsOptOutCookieName = '3PAdsOptOut';
var cookieValue = getCookie(ThirdPartyAdsOptOutCookieName);

//for unauthenticated users
return cookieValue != 1;
} catch {
return true;
}
}

function getCookie(cookieName) {
var cookieValue = document.cookie.match('(^|;)\\s*' + cookieName + '\\s*=\\s*([^;]+)');
return (cookieValue) ? cookieValue[2] : '';
}
// WCP initialization
WcpConsent.init("en-US", "cookie-banner", function (err, siteConsent) {
if (err != undefined) {
Expand All @@ -60,11 +76,18 @@
}
});

// Presence of the GPC indicates and opt-out of data sharing, mark the 1DS Analytics as such
var globalPrivacyControlEnabled = navigator.globalPrivacyControl;
// Set data sharing opt-in to false when GPC or AMC controls detected
var GPC_DataSharingOptIn = (globalPrivacyControlEnabled) ? false : checkThirdPartyAdsOptOutCookie();

// 1DS initialization
const analytics = new oneDS.ApplicationInsights();
var config = {
instrumentationKey: "%REACT_APP_ONEDS_TENANT_KEY%",
propertyConfiguration: {
// From https://msasg.visualstudio.com/Shared%20Data/_git/1DS.JavaScript?path=%2Fextensions%2Fproperties%2FEUCC.md&_a=preview&anchor=npm-setup-analytics-web
gpcDataSharingOptIn: GPC_DataSharingOptIn,
callback: {
userConsentDetails: window.siteConsent
? window.siteConsent.getConsent
Expand All @@ -91,7 +114,7 @@
};

//Initialize SDK
if ("%NODE_ENV%" !== "development") {
if ("%NODE_ENV%" !== "development") {
analytics.initialize(config, []);
}
</script>
Expand Down
14 changes: 14 additions & 0 deletions src/components/Footer.js
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,20 @@ const Footer = ({ onGrid = true }) => {
{manageConsent}
</li>
)}
<li>
<a
href="https://aka.ms/yourcaliforniaprivacychoices"
style={{ position: "relative", top: 4 }}
>
<div style={{ display: "flex", alignItems: "center" }}>
<img
src="data:image/svg+xml,%3Csvg%20role%3D%22img%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20viewBox%3D%220%200%2030%2014%22%20height%3D%2216%22%20width%3D%2243%22%3E%3Ctitle%3ECalifornia%20Consumer%20Privacy%20Act%20(CCPA)%20Opt-Out%20Icon%3C%2Ftitle%3E%3Cpath%20d%3D%22M7.4%2012.8h6.8l3.1-11.6H7.4C4.2%201.2%201.6%203.8%201.6%207s2.6%205.8%205.8%205.8z%22%20style%3D%22fill-rule%3Aevenodd%3Bclip-rule%3Aevenodd%3Bfill%3A%23fff%22%3E%3C%2Fpath%3E%3Cpath%20d%3D%22M22.6%200H7.4c-3.9%200-7%203.1-7%207s3.1%207%207%207h15.2c3.9%200%207-3.1%207-7s-3.2-7-7-7zm-21%207c0-3.2%202.6-5.8%205.8-5.8h9.9l-3.1%2011.6H7.4c-3.2%200-5.8-2.6-5.8-5.8z%22%20style%3D%22fill-rule%3Aevenodd%3Bclip-rule%3Aevenodd%3Bfill%3A%2306f%22%3E%3C%2Fpath%3E%3Cpath%20d%3D%22M24.6%204c.2.2.2.6%200%20.8L22.5%207l2.2%202.2c.2.2.2.6%200%20.8-.2.2-.6.2-.8%200l-2.2-2.2-2.2%202.2c-.2.2-.6.2-.8%200-.2-.2-.2-.6%200-.8L20.8%207l-2.2-2.2c-.2-.2-.2-.6%200-.8.2-.2.6-.2.8%200l2.2%202.2L23.8%204c.2-.2.6-.2.8%200z%22%20style%3D%22fill%3A%23fff%22%3E%3C%2Fpath%3E%3Cpath%20d%3D%22M12.7%204.1c.2.2.3.6.1.8L8.6%209.8c-.1.1-.2.2-.3.2-.2.1-.5.1-.7-.1L5.4%207.7c-.2-.2-.2-.6%200-.8.2-.2.6-.2.8%200L8%208.6l3.8-4.5c.2-.2.6-.2.9%200z%22%20style%3D%22fill%3A%2306f%22%3E%3C%2Fpath%3E%3C%2Fsvg%3E"
alt="privacy icon"
/>
<span>Your Privacy Choices</span>
</div>
</a>
</li>
<li className="x-hidden-focus">
{" "}
© Microsoft {new Date().getFullYear()}
Expand Down
2 changes: 1 addition & 1 deletion src/config/datasetGroups.yml
Original file line number Diff line number Diff line change
Expand Up @@ -346,7 +346,7 @@ cil-gdpcir:
GDPCIR data can be accessed on the Microsoft Planetary Computer. The dataset is
made of of three collections, distinguished by data license. More information
made up of two collections, distinguished by data license. More information
about the dataset is available in these collections, including access
instructions and examples, data formats, available models, methods, and
citation/licensing requirements, on each dataset's homepage linked below.
Expand Down

0 comments on commit f1cb764

Please sign in to comment.