Releases: nasa/cumulus-orca
v6.0.1
v6.0.0
Release v6.0.0
Changed
- ORCA-290 Renamed
excludeFileTypes
,orcaDefaultBucketOverride
,orcaDefaultRecoveryTypeOverride
, andorcaDefaultStorageClassOverride
toexcludedFileExtensions
,defaultBucketOverride
defaultRecoveryTypeOverride
, anddefaultStorageClassOverride
respectively. In addition, ORCA configuration variablesexcludedFileExtensions
,defaultBucketOverride
,defaultRecoveryTypeOverride
, anddefaultStorageClassOverride
are now undercollection.meta.orca
. - ORCA-290 Adjusted workflows/step functions for
OrcaRecoveryWorkflow
.excludeFileTypes
,orcaDefaultBucketOverride
andorcaDefaultStorageClassOverride
arguments intask_config
are nowexcludedFileExtensions
,defaultBucketOverride
anddefaultStorageClassOverride
respectively.excludedFileExtensions
,defaultBucketOverride
anddefaultStorageClassOverride
keys are now undercollection.meta.orca
. See the example below underMigration Notes
.
- ORCA-519 Enforced schema checks in
request_status_for_granule
andrequest_status_for_job
.
Both lambdas will return proper HTTP error codes for bad inputs of internal server errors.
Additionally, corrected error in API Reference
where theerror
status for these lambdas was incorrectly listed asfailed
. - ORCA-437 Requests to API Gateway now use IAM permissions, restricting anonymous access.
- ORCA-496 Mitigated SQS security issue. All SQS queues now use default encryption.
- Updated from Python 3.7 to 3.9
Migration Notes
- Adjust usage of
copy_to_glacier
in your step functions for new keys.excludeFileTypes
,orcaDefaultBucketOverride
, andorcaDefaultStorageClassOverride
arguments are nowexcludedFileExtensions
,defaultBucketOverride
, anddefaultStorageClassOverride
and are under a new keyorca
.
See example below:"task_config": { "excludedFileExtensions": "{$.meta.collection.meta.orca.excludedFileExtensions}", "defaultBucketOverride": "{$.meta.collection.meta.orca.defaultBucketOverride}", "defaultStorageClassOverride": "{$.meta.collection.meta.orca.defaultStorageClassOverride}" }
- Adjust Cumulus collection configuration integration for new
orca
key paths.excludeFileTypes
,orcaDefaultBucketOverride
andorcaDefaultStorageClassOverride
arguments are nowexcludedFileExtensions
,defaultBucketOverride
anddefaultStorageClassOverride
respectively.excludedFileExtensions
,defaultBucketOverride
anddefaultStorageClassOverride
keys are now under a new keyorca
. See example below:"collection": { "meta":{ "orca": { "defaultStorageClassOverride": "DEEP_ARCHIVE", "excludedFileExtensions": [".xml"], "defaultBucketOverride": "orca-bucket" } } }
v5.1.0
Release v5.1.0
Changed
- ORCA-478 Updated bucket policy documentation for deep glacier bucket in DR account so that the users now can only upload objects with storage type as either
GLACIER
orDEEP_ARCHIVE
. - ORCA-457
RequestFiles
will now raise a descriptive error when user attempts to recoverDEEP_ARCHIVE
files with theExpedited
recovery method.
For more details onstorageClass
see the OrcastorageClass
documentation.
Added
- ORCA-480 Added
storageClass
to Orca catalog and associated reporting API. Existing entries will be reported as in theGLACIER
storage class. - ORCA-479
Added variableorca_default_storage_class
which denotes the default storage class to use when storing files in Orca.
Currently allowed values areGLACIER
andDEEP_ARCHIVE
copy_to_glacier acceptsorcaDefaultStorageClassOverride
which can be used on a per-collection basis. If desired, add"orcaDefaultStorageClassOverride": "{$.meta.collection.meta.orcaDefaultStorageClassOverride}
to the workflow's task's task_config. - ORCA-458 Added
storage_class
to internal reconciliation. See reporting API for retrieval via reporting lambdas.
Migration Notes
-
Before upgrading, halt ingest and wait for the
PREFIX_orca_metadata
queue to reach 0 entries. -
The user should update their
orca.tf
,variables.tf
andterraform.tfvars
files with new variables. The following optional variables have been added:- orca_default_storage_class
-
If desired, update collection configurations with the new optional key
orcaDefaultStorageClassOverride
that can be added to override the default S3 glacier recovery type as shown below."meta": { "orcaDefaultStorageClassOverride": "DEEP_ARCHIVE" }
For more information on storage classes and their impact on available recovery options, see the Orca
storageClass
documentation. -
Add the following rule to the existing glacier archive bucket policy under
Condition
key:"s3:x-amz-storage-class": ["GLACIER", "DEEP_ARCHIVE"]
See this policy example for details.
-
The property
storageClass
returned by the Orphan reporting lambda has been renamed tos3StorageClass
. -
Update the
orca.tf
file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.## ORCA Module ## ============================================================================= module "orca" { source = "https://github.com/nasa/cumulus-orca/releases/download/v6.0.0/cumulus-orca-terraform.zip//modules" ## -------------------------- ## Cumulus Variables ## -------------------------- ## REQUIRED buckets = var.buckets lambda_subnet_ids = var.lambda_subnet_ids permissions_boundary_arn = var.permissions_boundary_arn prefix = var.prefix system_bucket = var.system_bucket vpc_id = var.vpc_id workflow_config = module.cumulus.workflow_config ## OPTIONAL tags = local.tags ## -------------------------- ## ORCA Variables ## -------------------------- ## REQUIRED db_admin_password = var.db_admin_password db_user_password = var.db_user_password db_host_endpoint = var.db_host_endpoint dlq_subscription_email = var.dlq_subscription_email orca_default_bucket = var.orca_default_bucket orca_reports_bucket_name = var.orca_reports_bucket_name rds_security_group_id = var.rds_security_group_id s3_access_key = var.s3_access_key s3_secret_key = var.s3_secret_key ## OPTIONAL db_admin_username = "postgres" default_multipart_chunksize_mb = 250 internal_report_queue_message_retention_time_seconds = 432000 orca_default_recovery_type = "Standard" orca_default_storage_class = "GLACIER" orca_delete_old_reconcile_jobs_frequency_cron = "cron(0 0 ? * SUN *)" orca_ingest_lambda_memory_size = 2240 orca_ingest_lambda_timeout = 720 orca_internal_reconciliation_expiration_days = 30 orca_recovery_buckets = [] orca_recovery_complete_filter_prefix = "" orca_recovery_expiration_days = 5 orca_recovery_lambda_memory_size = 128 orca_recovery_lambda_timeout = 720 orca_recovery_retry_limit = 3 orca_recovery_retry_interval = 1 orca_recovery_retry_backoff = 2 s3_inventory_queue_message_retention_time_seconds = 432000 s3_report_frequency = "Daily" sqs_delay_time_seconds = 0 sqs_maximum_message_size = 262144 staged_recovery_queue_message_retention_time_seconds = 432000 status_update_queue_message_retention_time_seconds = 777600 vpc_endpoint_id = null }
v5.0.0
Release v5.0.0
Added
- ORCA-300 Added
OrcaInternalReconciliation
workflow along with an accompanying input queue and dead-letter queue.
Retention time can be changed by settinginternal_report_queue_message_retention_time_seconds
in yourvariables.tf
ororca_variables.tf
file. Defaults to 432000. - ORCA-161 Added dead letter queue and cloudwatch alarm terraform code to recovery SQS queue.
- ORCA-307 Added lambda get_current_archive_list to pull S3 Inventory reports into Postgres.
Addsorca_reconciliation_lambda_memory_size
andorca_reconciliation_lambda_timeout
to Terraform variables. - ORCA-308 Added lambda perform_orca_reconcile to find differences between S3 Inventory reports and Orca catalog.
- ORCA-403 Added lambda post_to_queue_and_trigger_step_function to trigger step function for internal reconciliation.
- ORCA-373 Added input variable for
orca_reports_bucket_name
. Set in yourvariables.tf
ororca_variables.tf
file as shown below.
Report frequency defaults toDaily
, but can be set toWeekly
through variables3_report_frequency
. - ORCA-309 Added lambda internal_reconcile_report_phantom to report entries present in the catalog, but not s3.
- ORCA-382 Added lambda internal_reconcile_report_orphan to report entries present in S3 bucket, but not in the ORCA catalog.
- ORCA-291 request_files lambda now optionally accepts
orcaDefaultRecoveryTypeOverride
to override the glacier restore type at the workflow level by adding it to task_config. - ORCA-381 Added lambda internal_reconcile_report_mismatch to report entries present in S3 bucket and catalog, but with conflicting data.
- ORCA-310 Added lambda delete_old_reconcile_jobs for removing old reconciliation reports from the database.
Use new optional variableorca_internal_reconciliation_expiration_days
to set the retention period. - ORCA-372 Added automatic trigger for inventory events being read in by
post_to_queue_and_trigger_step_function
. - ORCA-306 Added API gateway resources for internal reconciliation reporting lambdas.
- ORCA-424 Added automatic trigger for delete_old_reconcile_jobs. Will run every sunday at midnight UTC.
Adjust with the new optional variableorca_delete_old_reconcile_jobs_frequency_cron
- ORCA-468 Added
status_update_dlq
to prevent ingest lock-down when theoretical errors occur.
Changed
- ORCA-299
db_deploy
task has been updated to deploy ORCA internal reconciliation tables and objects. - ORCA-161 Changed staged recovery SQS queue type from FIFO to standard queue.
- SQS Queue names adjusted to include Orca. For example:
"${var.prefix}-orca-status-update-queue.fifo"
. Queues will be automatically recreated by Terraform. - ORCA-334 Created IAM role for the extract_filepaths_for_granule lambda function, attached the role to the function
- ORCA-404 Updated shared_db and relevant lambdas to use secrets manager ARN instead of magic strings.
- ORCA-291 Updated request_files lambda and terraform so that the glacier restore type can be set via terraform during deployment. In addition, the glacier retrieval type can now be overridden via a change in the collections configuration using
orcaDefaultRecoveryTypeOverride
key undermeta
tag as shown below."meta": { "orcaDefaultRecoveryTypeOverride": "Standard" }
- ORCA-426 Performance improvements around json schema validators.
Migration Notes
-
Create a new bucket
PREFIX-orca-reports
in the same account and region as your primary orca bucket.- Give the bucket a lifecycle configuration with an expiration period of 30 days.
- Follow instructions in https://nasa.github.io/cumulus-orca/docs/developer/deployment-guide/deployment-s3-bucket/ to set up permission policy.
- Modify the permissions for your primary Orca bucket.
- Under the
Cross Account Access
policy, adds3:GetInventoryConfiguration
,s3:PutInventoryConfiguration
, ands3:ListBucketVersions
to Actions.
- Under the
-
The user should update their
orca.tf
,variables.tf
andterraform.tfvars
files with new variables. The following required variables have been added:- dlq_subscription_email
- orca_reports_bucket_name
- s3_access_key
- s3_secret_key
-
Update the collection configuration with the new optional key
orcaDefaultRecoveryTypeOverride
that can be added to override the default S3 glacier recovery type as shown below."meta": { "orcaDefaultRecoveryTypeOverride": "Standard" }
-
Add the following ORCA required variable definition to your
variables.tf
ororca_variables.tf
file.
variable "dlq_subscription_email" {
type = string
description = "The email to notify users when messages are received in dead letter SQS queue due to restore failure. Sends one email until the dead letter queue is emptied."
}
variable "orca_reports_bucket_name" {
type = string
description = "The name of the bucket to store s3 inventory reports."
}
variable "s3_access_key" {
type = string
description = "Access key for communicating with Orca S3 buckets."
}
variable "s3_secret_key" {
type = string
description = "Secret key for communicating with Orca S3 buckets."
}
- Update the
orca.tf
file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.## ORCA Module ## ============================================================================= module "orca" { source = "https://github.com/nasa/cumulus-orca/releases/download/v3.0.1/cumulus-orca-terraform.zip//modules" ## -------------------------- ## Cumulus Variables ## -------------------------- ## REQUIRED buckets = var.buckets lambda_subnet_ids = var.lambda_subnet_ids permissions_boundary_arn = var.permissions_boundary_arn prefix = var.prefix system_bucket = var.system_bucket vpc_id = var.vpc_id workflow_config = module.cumulus.workflow_config ## OPTIONAL tags = local.tags ## -------------------------- ## ORCA Variables ## -------------------------- ## REQUIRED db_admin_password = var.db_admin_password db_user_password = var.db_user_password db_host_endpoint = var.db_host_endpoint dlq_subscription_email = var.dlq_subscription_email orca_default_bucket = var.orca_default_bucket orca_reports_bucket_name = var.orca_reports_bucket_name rds_security_group_id = var.rds_security_group_id s3_access_key = var.s3_access_key s3_secret_key = var.s3_secret_key ## OPTIONAL db_admin_username = "postgres" default_multipart_chunksize_mb = 250 internal_report_queue_message_retention_time_seconds = 432000 orca_default_recovery_type = "Standard" orca_delete_old_reconcile_jobs_frequency_cron = "cron(0 0 ? * SUN *)" orca_ingest_lambda_memory_size = 2240 orca_ingest_lambda_timeout = 720 orca_internal_reconciliation_expiration_days = 30 orca_recovery_buckets = [] orca_recovery_complete_filter_prefix = "" orca_recovery_expiration_days = 5 orca_recovery_lambda_memory_size = 128 orca_recovery_lambda_timeout = 720 orca_recovery_retry_limit = 3 orca_recovery_retry_interval = 1 orca_recovery_retry_backoff = 2 s3_inventory_queue_message_retention_time_seconds = 432000 s3_report_frequency = "Daily" sqs_delay_time_seconds = 0 sqs_maximum_message_size = 262144 staged_recovery_queue_message_retention_time_seconds = 432000 status_update_queue_message_retention_time_seconds = 777600 vpc_endpoint_id = null }
Security
- Updated Docusaurus to version 2.0.0.beta-21 to resolve security issues.
v4.0.3
v4.0.2
v4.0.1
v4.0.0
Release v4.0.0
Migration Notes
- Orca is only compatible with versions of Cumulus that use the new Cumulus file format. Any calls to extract_filepaths_for_granule or copy_to_glacier should switch to the new format.
- Ensure that anything calling
copy_to_glacier
only relies on properties currently present incopy_to_glacier/schemas/output.json
- Remove any added references in your setup to copy_to_glacier_cumulus_translator. It is no longer necesarry as a Cumulus intermediary.
- The user should update their
orca.tf
,variables.tf
andterraform.tfvars
files with new variables. The following two variable names have changed:- postgres_user_pw-> db_admin_password (new)
- database_app_user_pw-> db_user_password (new)
- These are the new variables added:
- db_admin_username (defaults to "postgres")
- db_host_endpoint (Requires a value. Set in terraform.tfvars to your RDS Database's endpoint, similar to "PREFIX-cumulus-db.cluster-000000000000.us-west-2.rds.amazonaws.com")
- db_name (Defaults to PREFIX_orca.)
- Any
-
inprefix
are replaced with_
to follow SQL Naming Conventions - If preserving a database from a previous version of Orca, set to disaster_recovery.
- Any
- db_user_name (Defaults to PREFIX_orcauser.)
- Any
-
inprefix
are replaced with_
to follow SQL Naming Conventions - If preserving a database from a previous version of Orca, set to orcauser.
- Any
- rds_security_group_id (Requires a value. Set in terraform.tfvars to the Security Group ID of your RDS Database's Security Group. Output from Cumulus' RDS module as
security_group_id
) - vpc_endpoint_id
- Adjust workflows/step functions for
extract_filepaths
.file-buckets
argument intask_config
is nowfileBucketsMaps
.
- Adjust workflows/step functions for
copy_to_glacier
.multipart_chunksize_mb
argument intask_config
is now the Cumulus standard ofs3MultipartChunksizeMb
. See example below.copy_to_glacier
has new requirements for writing to the orca catalog. See example below. Required properties areproviderId
,executionId
,collectionShortname
, andcollectionVersion
. See example below.
"task_config": {
"s3MultipartChunksizeMb": "{$.meta.collection.meta.s3MultipartChunksizeMb}",
"excludeFileTypes": "{$.meta.collection.meta.excludeFileTypes}",
"providerId": "{$.meta.provider.id}",
"providerName": "{$.meta.provider.name}",
"executionId": "{$.cumulus_meta.execution_name}",
"collectionShortname": "{$.meta.collection.name}",
"collectionVersion": "{$.meta.collection.version}",
"orcaDefaultBucketOverride": "{$.meta.collection.meta.orcaDefaultBucketOverride}"
}
request_status_for_granule
input/output andrequest_status_for_job
input/output are now fully camel case.- Add the following ORCA required variables definition to your
variables.tf
ororca_variables.tf
file.
variable "db_admin_password" {
description = "Password for RDS database administrator authentication"
type = string
}
variable "db_user_password" {
description = "Password for RDS database user authentication"
type = string
}
variable "db_host_endpoint" {
type = string
description = "Database host endpoint to connect to."
}
variable "rds_security_group_id" {
type = string
description = "Cumulus' RDS Security Group's ID."
}
- Update the
orca.tf
file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.## ORCA Module ## ============================================================================= module "orca" { source = "https://github.com/nasa/cumulus-orca/releases/download/v3.0.1/cumulus-orca-terraform.zip//modules" ## -------------------------- ## Cumulus Variables ## -------------------------- ## REQUIRED buckets = var.buckets lambda_subnet_ids = var.lambda_subnet_ids permissions_boundary_arn = var.permissions_boundary_arn prefix = var.prefix system_bucket = var.system_bucket vpc_id = var.vpc_id workflow_config = module.cumulus.workflow_config ## OPTIONAL tags = local.tags ## -------------------------- ## ORCA Variables ## -------------------------- ## REQUIRED orca_default_bucket = var.orca_default_bucket db_admin_password = var.db_admin_password db_user_password = var.db_user_password db_host_endpoint = var.db_host_endpoint rds_security_group_id = var.rds_security_group_id ## OPTIONAL db_admin_username = "postgres" default_multipart_chunksize_mb = 250 orca_ingest_lambda_memory_size = 2240 orca_ingest_lambda_timeout = 720 orca_recovery_buckets = [] orca_recovery_complete_filter_prefix = "" orca_recovery_expiration_days = 5 orca_recovery_lambda_memory_size = 128 orca_recovery_lambda_timeout = 720 orca_recovery_retry_limit = 3 orca_recovery_retry_interval = 1 orca_recovery_retry_backoff = 2 sqs_delay_time_seconds = 0 sqs_maximum_message_size = 262144 staged_recovery_queue_message_retention_time_seconds = 432000 status_update_queue_message_retention_time_seconds = 777600 vpc_endpoint_id = null }
Removed
- The
modules/rds
directory is removed since ORCA will utilize the Cumulus DB. - ORCA-233 The
disaster_recovery
database, now renamedPREFIX_orca
, will now be created by db_deploy instead of Terraform. - ORCA-288 Removed copy_to_glacier_cumulus_translator due to better consistency in Cumulus's file dictionary.
- ORCA-311
copy_to_glacier
no longer accepts/returns file properties other thanbucket
andkey
.
copied_to_glacier
is similarly no longer passed through, but generated.
Added
- ORCA-256 Added AWS API Gateway in modules/api_gateway/main.tf for the catalog reporting lambda.
- ORCA-227 Added modules/secretsmanager directory that contains terraform code for deploying AWS secretsmanager.
- ORCA-177 Added AWS API Gateway in modules/api_gateway/main.tf for the request_status_for_granule and request_status_for_job lambdas.
- ORCA-257 orca_catalog_reporting lambda now returns data from actual catalog.
- ORCA-151 copy_to_glacier and request_files now optionally accept "orcaDefaultBucketOverride" which can be used on a per-collection basis. If desired, add "orcaDefaultBucketOverride": "{$.meta.collection.meta.orcaDefaultBucketOverride}" to the workflow's task's task_config.
- ORCA-335 request_files now recognizes when a file is already recovered, and posts an error message to status tables.
- ORCA-230 copy_to_glacier now writes metadata to an ORCA catalog for comparisons to cumulus holdings.
Changed
- ORCA-217 Lambda inputs now conform to the Cumulus camel case standard.
- ORCA-297 Default database name is now PREFIX_orca
- ORCA-287 Updated copy_to_glacier and extract_filepaths_for_granule to new Cumulus file format.
- ORCA-245 Updated resource policies related to KMS keys to provide better security.
- ORCA-318 Updated post_to_catalog lambda to match new Cumulus schema changes.
- ORCA-317 Updated the db_deploy task, unit tests, manual tests, research pages and SQL to reflect new inventory layout to better align with Cumulus.
- ORCA-249 Changed
mutipart_chunksize_mb
in lambda configs tos3MultipartChunksizeMb
. Standard workflows now pull from$.meta.collection.meta.s3MultipartChunksizeMb
- ORCA-230 Updated lambdas to use Cumulus Message Adapter Python v2.0.0.
- ORCA-132 Updated workflows to use latest Cumulus v10.0.0 workflow code.
v4.0.0-Beta3
Release v4.0.0-Beta3
v4.0.0-Beta2
Release v4.0.0-Beta2