Skip to content

Commit

Permalink
Merge branch 'master' into eurostat_refresh
Browse files Browse the repository at this point in the history
  • Loading branch information
HarishC727 authored Feb 26, 2024
2 parents 93973ba + b249c31 commit 544773a
Show file tree
Hide file tree
Showing 126 changed files with 2,164 additions and 23,778 deletions.
9 changes: 1 addition & 8 deletions import-automation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@
The import automation system has three components:
1. [Cloud Build configuration file](cloudbuild/README.md)
2. [Executor](executor/README.md)
3. Import Progress Dashboard
- [Import Progress Dashboard API](import-progress-dashboard-api/README.md)
- [Import Progress Dashboard Frontend](import-progress-dashboard-frontend/README.md)

## User Manual

Expand Down Expand Up @@ -120,7 +117,6 @@ directories:
specify the import targets in the commit message
(see [Specifying Import Targets](#specifying-import-targets)). If no tag
is found, no imports will be executed.
5. Check the [Import Progress Dashboard](https://dashboard-frontend-dot-datcom-data.uc.r.appspot.com/)

### Scheduling Updates

Expand All @@ -134,8 +130,6 @@ directories:

1. Check in the `import-automation` directory to the repository.
2. [Configure](executor/README.md#configuring-the-executor) and [deploy](executor/README.md#deploying-on-app-engine) the executor
3. [Deploy the import progress dashboard API](import-progress-dashboard-api/README.md#deploying-to-app-engine)
4. [Deploy the import progress dashboard frontend](import-progress-dashboard-frontend/README.md#deploying-to-app-engine)
5. [Create a Cloud Tasks queue](#creating-cloud-task-queue)
6. [Connect the repository to Cloud Build and set up Cloud Build triggers](#setting-up-cloud-build)

Expand All @@ -162,7 +156,6 @@ directories:
- **File type**: `Cloud Build configuration file (yaml or json)`
- **Cloud Build configuration file location**: `/import-automation/cloudbuild/cloudbuild.yaml`
- **Substitution variables**
- **_DASHBOARD_OAUTH_CLIENT_ID**: `<OAuth client ID used to authenticate with the import progress dashboard>` (This can be found by going to the Identity-Aware Proxy of the Google Cloud project that hosts the dashboard and clicking 'Edit OAuth Client.)
- **_EMAIL_ACCOUNT**: `<email account used for sending notifications>`;
- **_EMAIL_TOKEN**: `<password, app password, or access token of the email account>`
- **_GITHUB_AUTH_USERNAME**: `<GitHub username to authenticate with GitHub API>`
Expand All @@ -171,7 +164,7 @@ directories:
- **_GITHUB_REPO_OWNER_USERNAME**: `<username of the owner of the repository, e.g., datacommonsorg>`
- **_HANDLER_SERVICE**: `<service the executor is deployed to, e.g., default>`
- **_HANDLER_URI**: `<URI of the executor's endpoint that imports to dev, e.g., />`
- **_IMPORTER_OAUTH_CLIENT_ID**: `<OAuth client ID used to authenticate with the proxy for the importer>` (This can be found similarly as **_DASHBOARD_OAUTH_CLIENT_ID**)
- **_IMPORTER_OAUTH_CLIENT_ID**: `<OAuth client ID used to authenticate with the proxy for the importer>`
- **_TASK_LOCATION_ID**: `<location ID of the Cloud Tasks queue, e.g., us-central1>` (This can be found by going to the Cloud Tasks control panel and look at the "Location" column.)
- **_TASK_PROJECT_ID**: `<ID of the Google Cloud project that hosts the task queue, e.g., google.com:datcom-data>`
- **_TASK_QUEUE_NAME**: `<Name of the task queue>`
Expand Down
1 change: 0 additions & 1 deletion import-automation/cloudbuild/cloudbuild.gke.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ steps:
"PR_NUMBER": "$_PR_NUMBER",
"configs": {
"gcp_project_id": "$_GCP_PROJECT_ID",
"dashboard_oauth_client_id": "$_DASHBOARD_OAUTH_CLIENT_ID",
"importer_oauth_client_id": "$_IMPORTER_OAUTH_CLIENT_ID",
"github_auth_username": "$_GITHUB_AUTH_USERNAME",
"github_auth_access_token": "$_GITHUB_AUTH_ACCESS_TOKEN",
Expand Down
1 change: 0 additions & 1 deletion import-automation/cloudbuild/cloudbuild.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ steps:
"BASE_BRANCH": "$_BASE_BRANCH",
"PR_NUMBER": "$_PR_NUMBER",
"configs": {
"dashboard_oauth_client_id": "$_DASHBOARD_OAUTH_CLIENT_ID",
"importer_oauth_client_id": "$_IMPORTER_OAUTH_CLIENT_ID",
"github_auth_username": "$_GITHUB_AUTH_USERNAME",
"github_auth_access_token": "$_GITHUB_AUTH_ACCESS_TOKEN",
Expand Down
2 changes: 1 addition & 1 deletion import-automation/executor/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,4 +27,4 @@ RUN pip install -r /workspace/requirements.txt

COPY app/. /workspace/app/

CMD gunicorn --timeout 1800 --workers 5 -b :$PORT app.main:FLASK_APP
CMD gunicorn --timeout 0 --workers 5 -b :$PORT app.main:FLASK_APP
19 changes: 6 additions & 13 deletions import-automation/executor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,9 @@ Commons knowledge graph using the importer.
- Required Arguments
- `COMMIT_SHA`: Commit sha of the commit that specifies the targets
in its commit message
- Optional Arguments (Used only for logging to the Import Progress Dashboard)
- `REPO_NAME`: Name of the repository from which the pull request is sent
- `BRANCH_NAME`: Name of the branch from which the pull request is sent
- `PR_NUMBER`: Number of the pull request
- Required Configurations (See [app/configs.py](app/configs.py) for
descriptions and [Configuring the Executor](#configuring-the-executor) for
how to pass these configurations)
- `dashboard_oauth_client_id`
- `importer_oauth_client_id`
- `github_auth_username`
- `github_auth_access_token`
Expand All @@ -30,7 +25,6 @@ Commons knowledge graph using the importer.
- Required Arguments
- `absolute_import_name`: Absolute import name of the import to update
- Required Configurations
- `dashboard_oauth_client_id`
- `github_auth_username`
- `github_auth_access_token`
- `email_account`
Expand All @@ -41,7 +35,6 @@ Commons knowledge graph using the importer.
- `COMMIT_SHA`: Commit sha of the commit that specifies the targets
in its commit message
- Required Configurations
- `dashboard_oauth_client_id`
- `github_auth_username`
- `github_auth_access_token`

Expand Down Expand Up @@ -81,16 +74,16 @@ Run `./schedule_update_import.sh --help` for usage.
To schedule an import to run as a cron job on the GCP Cloud Scheduler, do the following:

```
Run `./schedule_update_import.sh -s <config_project_id> <path_to_import>`
Run `./schedule_update_import.sh -s <gke_project_id> <path_to_import>`
```

`<config_project_id>` is the GCP project id where the config file is stored, e.g. `datcom-import-automation`.
`<gke_project_id>` is the GCP project id where the import executer is run from e.g. `datcom-import-automation-prod`.
`<path_to_import>` is the path to the import (relative to the root directory of the `data` repo), with the name of the import provided with a colon, e.g. `scripts/us_usda/quickstats:UsdaAgSurvey`.

Example invocation:

```
Run `./schedule_update_import.sh -s datcom-import-automation scripts/us_usda/quickstats:UsdaAgSurvey`
Run `./schedule_update_import.sh -s datcom-import-automation-prod scripts/us_usda/quickstats:UsdaAgSurvey`
```

The script will log the name of the Cloud Scheduler job and a url for all the jobs on the scheduler. Please verify that all the job metadata was updated as expected.
Expand All @@ -106,16 +99,16 @@ Once the script runs to completion, the data directory's latest update is printe
To excute an Update locally, do the following:

```
Run `./schedule_update_import.sh -u <config_project_id> <path_to_import>`
Run `./schedule_update_import.sh -u <gke_project_id> <path_to_import>`
```

`<config_project_id>` is the GCP project id where the config file is stored, e.g. `datcom-import-automation`.
`<gke_project_id>` is the GCP project id where the import executer is run from e.g. `datcom-import-automation-prod`.
`<path_to_import>` is the path to the import (relative to the root directory of the `data` repo), with the name of the import provided with a colon, e.g. `scripts/us_usda/quickstats:UsdaAgSurvey`.

Example invocation:

```
Run `./schedule_update_import.sh -u datcom-import-automation scripts/us_usda/quickstats:UsdaAgSurvey`
Run `./schedule_update_import.sh -u datcom-import-automation-prod scripts/us_usda/quickstats:UsdaAgSurvey`
```


Expand Down
2 changes: 1 addition & 1 deletion import-automation/executor/app.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
runtime: python37
entrypoint: gunicorn --timeout 1800 -b :$PORT app.main:FLASK_APP
entrypoint: gunicorn --timeout 0 -b :$PORT app.main:FLASK_APP
env_variables:
EXECUTOR_PRODUCTION: "True"
TMPDIR: "/tmp"
Expand Down
1 change: 1 addition & 0 deletions import-automation/executor/app/configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ class ExecutorConfig:
# Types of inputs accepted by the Data Commons importer. These are
# also the accepted fields of an import_inputs value in the manifest.
import_input_types: List[str] = ('template_mcf', 'cleaned_csv', 'node_mcf')
# DEPRECATED
# Oauth Client ID used to authenticate with the import progress dashboard.
# which is protected by Identity-Aware Proxy. This can be found by going
# to the Identity-Aware Proxy of the Google Cloud project that hosts
Expand Down
Loading

0 comments on commit 544773a

Please sign in to comment.