Clone the dr-podaac-swot
repo from https://git.earthdata.nasa.gov/scm/pocumulus/dr-podaac-swot.git
git clone https://git.earthdata.nasa.gov/scm/pocumulus/dr-podaac-swot.git disaster-recovery
Before you can deploy this infrastructure, the lambda function source-code must be built.
./bin/build_tasks.sh "<version_number>"
will crawl the tasks
directory and build a .zip
file (currently by just zipping
all python files and dependencies) in each of it's sub-directories. That .zip
is then referenced in the main.tf
lamdba definitions.
<version_number>
just has to be consistent with the dr_version
variable passed into the disaster_recovery
module. This value has to be updated for changes to lambdas to be deployed.
cd disaster-recovery
./bin/build_tasks.sh "0.1.1"
The Disaster Recovery deployment is done with Terraform root module,
main.tf
.
The following instructions will walk you through installing Terraform, configuring Terraform, and deploying the root module.
If you are using a Mac and Homebrew, installing Terraform is as simple as:
brew update
brew install terraform
For other cases, Visit the Terraform documentation for installation instructions.
Verify that the version of Terraform installed is at least v0.12.0.
$ terraform --version
Terraform v0.12.2
The state of the Terraform deployment is stored in S3. In the following
examples, it will be assumed that state is being stored in a bucket called
dr-tf-state
. You can also use an existing bucket, if desired.
Create the state bucket:
aws s3api create-bucket --bucket dr-tf-state --create-bucket-configuration LocationConstraint=us-west-2
Note: The --create-bucket-configuration
line is only necessary if you are creating your bucket outside of us-east-1
.
In order to help prevent loss of state information, it is recommended that versioning be enabled on the state bucket:
$ aws s3api put-bucket-versioning \
--bucket dr-tf-state \
--versioning-configuration Status=Enabled
Terraform uses a lock stored in DynamoDB in order to prevent multiple
simultaneous updates. In the following examples, that table will be called
dr-tf-locks
.
Create the locks table:
--billing-mode
option was recently added to the AWS CLI. You
may need to upgrade your version of the AWS CLI if you get an error about
provisioned throughput when creating the table.
$ aws dynamodb create-table \
--table-name dr-tf-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
These steps should be executed in the disaster-recovery
directory.
Create a terraform.tf
file, substituting the appropriate values for bucket
and dynamodb_table
. This tells Terraform where to store its
remote state.
terraform.tf
terraform {
backend "s3" {
region = "us-west-2"
bucket = "dr-tf-state"
key = "terraform.tfstate"
dynamodb_table = "dr-tf-locks"
}
}
First, run a mv terraform.tfvars.example terraform.tfvars
to get a template terraform.tfvars
in your working directory. This is where you will place input variables to Terraform.
Necessary:
ngap_subnets
- NGAP Subnets (array)vpc_id
- ID of VPC to place resources in - recommended that this be a private VPC (or at least one with restricted access).glacier_bucket
- Bucket with Glacier policypublic_bucket
- Bucket with public permissions (Cumulus public bucket)private_bucket
- Bucket with private permissions (Cumulus private bucket)internal_bucket
- Analogous to the Cumulus internal bucketprotected_bucket
- Analogous to the Cumulus protected bucketpermissions_boundary_arn
- Permission Boundary Arn (Policy) for NGAP compliancepostgres_user_pw
- password for the postgres userdatabase_name
- disaster_recoverydatabase_app_user
- druserdatabase_app_user_pw
- the password for the application user
Optional:
prefix
- Prefix that will be pre-pended to resource names created by terraform. Defaults todr
.profile
- AWS CLI Profile (configured viaaws configure
) to use. Defaults todefault
.region
- Your AWS region. Defaults tous-west-2
.restore_expire_days
- How many days to restore a file for. Defaults to 5.restore_request_retries
- How many times to retry a restore request to Glacier. Defaults to 3.restore_retry_sleep_secs
- How many seconds to wait between retry calls torestore_object
. Defaults to 3.restore_retrieval_type
- the Tier for the restore request. Valid values are 'Standard'|'Bulk'|'Expedited'. Defaults toStandard
. Understand the costs associated with the tiers before modifying.copy_retries
- How many times to retry a copy request from the restore location to the archive location. Defaults to 3.copy_retry_sleep_secs
- How many seconds to wait between retry calls tocopy_object
. Defaults to 0.ddl_dir
- the location of the ddl dir that contains the sql to create the application database. Defaults to 'ddl/'.drop_database
- Whether or not to drop the database if it exists (True), or keep it (False). Defaults to False.database_port
- the port for the postgres database. Defaults to '5432'.platform
- indicates if running locally (onprem) or in AWS (AWS). Defaults to 'AWS'.
Run terraform init
.
Run terraform plan
#optional, but allows you to preview the deploy.
Run terraform apply
.
This will deploy Disaster Recovery.
If you want to remove it:
terraform destroy
Integrate Disaster Recovery with Cumulus to be able to recover a granule from the Cumulus Dashboard.
Copy the workflow from workflows/workflows.yml.dr
into your Cumulus workflow.
Set the values of these environment variables to the ARN for the {prefix}-extract-filepaths-for-granule and {prefix}-request-files lambdas, respectively:
DR_EXTRACT_LAMBDA_ARN=arn:aws:lambda:us-west-2:012345678912:function:dr_extract_filepaths_for_granule
DR_REQUEST_LAMBDA_ARN=arn:aws:lambda:us-west-2:012345678912:function:dr_request_files
Add an aws
provider to main.tf
:
provider "aws" {
version = "~> 2.13"
region = var.region
profile = var.profile
}
We will be adding a disaster-recovery
module to cumulus-tf/main.tf
. First, since there isn't a distributed version of the disaster-recovery
module at the time of writing this documentation, you'll have to clone this repository locally: https://github.com/podaac/cumulus-disaster-recovery.git
.
In the disaster-recovery
repo, build the lambda tasks with ./bin/build_tasks.sh "<version_number>"
.
Once that is done, navigate to cumulus-tf/main.tf
within your Cumulus deployment directory and add the following module:
module "disaster-recovery" {
source = "<location of dr module locally>"
prefix = var.prefix
vpc_id = var.vpc_id
dr_version = "0.1.0"
ngap_subnets = var.ngap_db_subnets
public_bucket = var.buckets["public"]["name"]
glacier_bucket = var.buckets["glacier"]["name"]
private_bucket = var.buckets["private"]["name"]
internal_bucket = var.buckets["internal"]["name"]
protected_bucket = var.buckets["protected"]["name"]
permissions_boundary_arn = var.permissions_boundary_arn
postgres_user_pw = var.postgres_user_pw
database_name = var.dr_database_name
database_app_user = var.dr_database_app_user
database_app_user_pw = var.dr_database_app_user_pw
}
Note: This above snippet assumes that you've configured your Cumulus deployment. More information on that process can be found in their documentation
To support this module, you'll have to add the following values to your cumulus-tf/variables.tf
file:
# Variables specific to DR
variable "ngap_db_subnets" {
type = list(string)
}
variable "postgres_user_pw" {
type = string
}
variable "dr_database_name" {
type = string
}
variable "dr_database_app_user" {
type = string
}
variable "dr_database_app_user_pw" {
type = string
}
The values corresponding to these variables must be set in your cumulus-tf/terraform.tfvars
file.
Copy the workflow from workflows/dr_recovery_workflow.tf
into your cumulus-tf/
directory. In the workflow file you may need to change the disaster-recovery
portion of the following two strings to the name of your disaster-recovery module (defined in cumulus-tf/main.tf
):
module.disaster-recovery.extract_filepaths_lambda_arn
module.disaster-recovery.request_files_lambda_arn
To configure a collection to enable Disaster Recovery, add the line
"granuleRecoveryWorkflow": DrRecoveryWorkflow"
to the collection configuration:
{
"queriedAt": "2019-11-07T22:49:46.842Z",
"name": "L0A_HR_RAW",
"version": "1",
"sampleFileName": "L0A_HR_RAW_product_0001-of-0420.h5",
"dataType": "L0A_HR_RAW",
"granuleIdExtraction": "^(.*)((\\.cmr\\.json)|(\\.iso\\.xml)|(\\.tar\\.gz)|(\\.h5)|(\\.h5\\.mp))$",
"reportToEms": true,
"createdAt": 1561749178492,
"granuleId": "^.*$",
"provider_path": "L0A_HR_RAW/",
"meta": {
"response-endpoint": "arn:aws:sns:us-west-2:012345678912:providerResponseSNS",
"glacier-bucket": "podaac-sndbx-cumulus-glacier",
"granuleRecoveryWorkflow": DrRecoveryWorkflow"
},
"files": [
{
To enable the Recover Granule
button on the Cumulus Dashboard (available at github.com/nasa/cumulus-dashboard),
set the environment variable ENABLE_RECOVERY=true
.
Here is an example command to run the Cumulus Dashboard locally:
APIROOT=https://uttm5y1jcj.execute-api.us-west-2.amazonaws.com:8000/dev ENABLE_RECOVERY=true npm run serve