Skip to content

Releases: nasa/cumulus-orca

v6.0.1

12 Oct 19:11
Compare
Choose a tag to compare

Release v6.0.1

Changed

  • ORCA-566 Shortened S3 inventory report name due to length limitation causing errors when a user's naming schema is long.

v6.0.0

15 Sep 21:04
Compare
Choose a tag to compare

Release v6.0.0

Changed

  • ORCA-290 Renamed excludeFileTypes, orcaDefaultBucketOverride, orcaDefaultRecoveryTypeOverride, and orcaDefaultStorageClassOverride to excludedFileExtensions, defaultBucketOverride defaultRecoveryTypeOverride, and defaultStorageClassOverride respectively. In addition, ORCA configuration variables excludedFileExtensions, defaultBucketOverride, defaultRecoveryTypeOverride, and defaultStorageClassOverride are now under collection.meta.orca.
  • ORCA-290 Adjusted workflows/step functions for OrcaRecoveryWorkflow.
    • excludeFileTypes, orcaDefaultBucketOverride and orcaDefaultStorageClassOverride arguments in task_config are now excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride respectively.
    • excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride keys are now under collection.meta.orca. See the example below under Migration Notes.
  • ORCA-519 Enforced schema checks in request_status_for_granule and request_status_for_job.
    Both lambdas will return proper HTTP error codes for bad inputs of internal server errors.
    Additionally, corrected error in API Reference
    where the error status for these lambdas was incorrectly listed as failed.
  • ORCA-437 Requests to API Gateway now use IAM permissions, restricting anonymous access.
  • ORCA-496 Mitigated SQS security issue. All SQS queues now use default encryption.
  • Updated from Python 3.7 to 3.9

Migration Notes

  • Adjust usage of copy_to_glacier in your step functions for new keys.
    • excludeFileTypes, orcaDefaultBucketOverride, and orcaDefaultStorageClassOverride arguments are now excludedFileExtensions, defaultBucketOverride, and defaultStorageClassOverride and are under a new key orca.
      See example below:
      "task_config": {
        "excludedFileExtensions": "{$.meta.collection.meta.orca.excludedFileExtensions}",
        "defaultBucketOverride": "{$.meta.collection.meta.orca.defaultBucketOverride}",
        "defaultStorageClassOverride": "{$.meta.collection.meta.orca.defaultStorageClassOverride}"
      }
  • Adjust Cumulus collection configuration integration for new orca key paths.
    • excludeFileTypes, orcaDefaultBucketOverride and orcaDefaultStorageClassOverride arguments are now excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride respectively.
    • excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride keys are now under a new key orca. See example below:
        "collection": {
            "meta":{
                "orca": {
                  "defaultStorageClassOverride": "DEEP_ARCHIVE",
                  "excludedFileExtensions": [".xml"],
                  "defaultBucketOverride": "orca-bucket"
              }
          }
        }

v5.1.0

11 Aug 14:00
6b6d73b
Compare
Choose a tag to compare

Release v5.1.0

Changed

  • ORCA-478 Updated bucket policy documentation for deep glacier bucket in DR account so that the users now can only upload objects with storage type as either GLACIER or DEEP_ARCHIVE.
  • ORCA-457 RequestFiles will now raise a descriptive error when user attempts to recover DEEP_ARCHIVE files with the Expedited recovery method.
    For more details on storageClass see the Orca storageClass documentation.

Added

  • ORCA-480 Added storageClass to Orca catalog and associated reporting API. Existing entries will be reported as in the GLACIER storage class.
  • ORCA-479
    Added variable orca_default_storage_class which denotes the default storage class to use when storing files in Orca.
    Currently allowed values are GLACIER and DEEP_ARCHIVE
    copy_to_glacier accepts orcaDefaultStorageClassOverride which can be used on a per-collection basis. If desired, add "orcaDefaultStorageClassOverride": "{$.meta.collection.meta.orcaDefaultStorageClassOverride} to the workflow's task's task_config.
  • ORCA-458 Added storage_class to internal reconciliation. See reporting API for retrieval via reporting lambdas.

Migration Notes

  • Before upgrading, halt ingest and wait for the PREFIX_orca_metadata queue to reach 0 entries.

  • The user should update their orca.tf, variables.tf and terraform.tfvars files with new variables. The following optional variables have been added:

    • orca_default_storage_class
  • If desired, update collection configurations with the new optional key orcaDefaultStorageClassOverride that can be added to override the default S3 glacier recovery type as shown below.

      "meta": {
        "orcaDefaultStorageClassOverride": "DEEP_ARCHIVE"
      }

    For more information on storage classes and their impact on available recovery options, see the Orca storageClass documentation.

  • Add the following rule to the existing glacier archive bucket policy under Condition key:

    "s3:x-amz-storage-class": ["GLACIER", "DEEP_ARCHIVE"]

    See this policy example for details.

  • The property storageClass returned by the Orphan reporting lambda has been renamed to s3StorageClass.

  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.

    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v6.0.0/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    workflow_config          = module.cumulus.workflow_config
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    db_admin_password        = var.db_admin_password
    db_user_password         = var.db_user_password
    db_host_endpoint         = var.db_host_endpoint
    dlq_subscription_email   = var.dlq_subscription_email
    orca_default_bucket      = var.orca_default_bucket
    orca_reports_bucket_name = var.orca_reports_bucket_name
    rds_security_group_id    = var.rds_security_group_id
    s3_access_key            = var.s3_access_key
    s3_secret_key            = var.s3_secret_key
    
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    internal_report_queue_message_retention_time_seconds = 432000
    orca_default_recovery_type                           = "Standard"
    orca_default_storage_class                           = "GLACIER"
    orca_delete_old_reconcile_jobs_frequency_cron        = "cron(0 0 ? * SUN *)"
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_internal_reconciliation_expiration_days         = 30
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    s3_inventory_queue_message_retention_time_seconds    = 432000
    s3_report_frequency                                  = "Daily"
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

v5.0.0

17 Jun 19:32
c55ff52
Compare
Choose a tag to compare

Release v5.0.0

Added

  • ORCA-300 Added OrcaInternalReconciliation workflow along with an accompanying input queue and dead-letter queue.
    Retention time can be changed by setting internal_report_queue_message_retention_time_seconds in your variables.tf or orca_variables.tf file. Defaults to 432000.
  • ORCA-161 Added dead letter queue and cloudwatch alarm terraform code to recovery SQS queue.
  • ORCA-307 Added lambda get_current_archive_list to pull S3 Inventory reports into Postgres.
    Adds orca_reconciliation_lambda_memory_size and orca_reconciliation_lambda_timeout to Terraform variables.
  • ORCA-308 Added lambda perform_orca_reconcile to find differences between S3 Inventory reports and Orca catalog.
  • ORCA-403 Added lambda post_to_queue_and_trigger_step_function to trigger step function for internal reconciliation.
  • ORCA-373 Added input variable for orca_reports_bucket_name. Set in your variables.tf or orca_variables.tf file as shown below.
    Report frequency defaults to Daily, but can be set to Weekly through variable s3_report_frequency.
  • ORCA-309 Added lambda internal_reconcile_report_phantom to report entries present in the catalog, but not s3.
  • ORCA-382 Added lambda internal_reconcile_report_orphan to report entries present in S3 bucket, but not in the ORCA catalog.
  • ORCA-291 request_files lambda now optionally accepts orcaDefaultRecoveryTypeOverride to override the glacier restore type at the workflow level by adding it to task_config.
  • ORCA-381 Added lambda internal_reconcile_report_mismatch to report entries present in S3 bucket and catalog, but with conflicting data.
  • ORCA-310 Added lambda delete_old_reconcile_jobs for removing old reconciliation reports from the database.
    Use new optional variable orca_internal_reconciliation_expiration_days to set the retention period.
  • ORCA-372 Added automatic trigger for inventory events being read in by post_to_queue_and_trigger_step_function.
  • ORCA-306 Added API gateway resources for internal reconciliation reporting lambdas.
  • ORCA-424 Added automatic trigger for delete_old_reconcile_jobs. Will run every sunday at midnight UTC.
    Adjust with the new optional variable orca_delete_old_reconcile_jobs_frequency_cron
  • ORCA-468 Added status_update_dlq to prevent ingest lock-down when theoretical errors occur.

Changed

  • ORCA-299 db_deploy task has been updated to deploy ORCA internal reconciliation tables and objects.
  • ORCA-161 Changed staged recovery SQS queue type from FIFO to standard queue.
  • SQS Queue names adjusted to include Orca. For example: "${var.prefix}-orca-status-update-queue.fifo". Queues will be automatically recreated by Terraform.
  • ORCA-334 Created IAM role for the extract_filepaths_for_granule lambda function, attached the role to the function
  • ORCA-404 Updated shared_db and relevant lambdas to use secrets manager ARN instead of magic strings.
  • ORCA-291 Updated request_files lambda and terraform so that the glacier restore type can be set via terraform during deployment. In addition, the glacier retrieval type can now be overridden via a change in the collections configuration using orcaDefaultRecoveryTypeOverride key under meta tag as shown below.
    "meta": {
      "orcaDefaultRecoveryTypeOverride": "Standard"
    }
  • ORCA-426 Performance improvements around json schema validators.

Migration Notes

  • Create a new bucket PREFIX-orca-reports in the same account and region as your primary orca bucket.

  • The user should update their orca.tf, variables.tf and terraform.tfvars files with new variables. The following required variables have been added:

    • dlq_subscription_email
    • orca_reports_bucket_name
    • s3_access_key
    • s3_secret_key
  • Update the collection configuration with the new optional key orcaDefaultRecoveryTypeOverride that can be added to override the default S3 glacier recovery type as shown below.

      "meta": {
        "orcaDefaultRecoveryTypeOverride": "Standard"
      }
  • Add the following ORCA required variable definition to your variables.tf or orca_variables.tf file.

variable "dlq_subscription_email" {
  type        = string
  description = "The email to notify users when messages are received in dead letter SQS queue due to restore failure. Sends one email until the dead letter queue is emptied."
}

variable "orca_reports_bucket_name" {
  type        = string
  description = "The name of the bucket to store s3 inventory reports."
}

variable "s3_access_key" {
  type        = string
  description = "Access key for communicating with Orca S3 buckets."
}

variable "s3_secret_key" {
  type        = string
  description = "Secret key for communicating with Orca S3 buckets."
}
  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.
    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v3.0.1/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    workflow_config          = module.cumulus.workflow_config
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    db_admin_password        = var.db_admin_password
    db_user_password         = var.db_user_password
    db_host_endpoint         = var.db_host_endpoint
    dlq_subscription_email   = var.dlq_subscription_email
    orca_default_bucket      = var.orca_default_bucket
    orca_reports_bucket_name = var.orca_reports_bucket_name
    rds_security_group_id    = var.rds_security_group_id
    s3_access_key            = var.s3_access_key
    s3_secret_key            = var.s3_secret_key
    
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    internal_report_queue_message_retention_time_seconds = 432000
    orca_default_recovery_type                           = "Standard"
    orca_delete_old_reconcile_jobs_frequency_cron        = "cron(0 0 ? * SUN *)"
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_internal_reconciliation_expiration_days         = 30
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    s3_inventory_queue_message_retention_time_seconds    = 432000
    s3_report_frequency                                  = "Daily"
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

Security

  • Updated Docusaurus to version 2.0.0.beta-21 to resolve security issues.

v4.0.3

02 Jun 19:30
Compare
Choose a tag to compare

Release v4.0.3

Fixed

  • Fixed bug where db_admin_username had to be lower-case.

v4.0.2

18 May 14:30
Compare
Choose a tag to compare

Release v4.0.2

Fixed

  • Fixed bug where db_admin_username was not set as the owner of new databases.

v4.0.1

16 Feb 18:14
Compare
Choose a tag to compare

Release v4.0.1

Fixed

  • Updated release build script to perform cleanup sooner.
  • Updated terraform deployment with additional depends_on parameters and fixes to prevent db_deploy lambda from firing prematurely.

v4.0.0

16 Feb 01:43
Compare
Choose a tag to compare

Release v4.0.0

Migration Notes

  • Orca is only compatible with versions of Cumulus that use the new Cumulus file format. Any calls to extract_filepaths_for_granule or copy_to_glacier should switch to the new format.
  • Ensure that anything calling copy_to_glacier only relies on properties currently present in copy_to_glacier/schemas/output.json
  • Remove any added references in your setup to copy_to_glacier_cumulus_translator. It is no longer necesarry as a Cumulus intermediary.
  • The user should update their orca.tf, variables.tf and terraform.tfvars files with new variables. The following two variable names have changed:
    • postgres_user_pw-> db_admin_password (new)
    • database_app_user_pw-> db_user_password (new)
  • These are the new variables added:
    • db_admin_username (defaults to "postgres")
    • db_host_endpoint (Requires a value. Set in terraform.tfvars to your RDS Database's endpoint, similar to "PREFIX-cumulus-db.cluster-000000000000.us-west-2.rds.amazonaws.com")
    • db_name (Defaults to PREFIX_orca.)
      • Any - in prefix are replaced with _ to follow SQL Naming Conventions
      • If preserving a database from a previous version of Orca, set to disaster_recovery.
    • db_user_name (Defaults to PREFIX_orcauser.)
      • Any - in prefix are replaced with _ to follow SQL Naming Conventions
      • If preserving a database from a previous version of Orca, set to orcauser.
    • rds_security_group_id (Requires a value. Set in terraform.tfvars to the Security Group ID of your RDS Database's Security Group. Output from Cumulus' RDS module as security_group_id)
    • vpc_endpoint_id
  • Adjust workflows/step functions for extract_filepaths.
    • file-buckets argument in task_config is now fileBucketsMaps.
  • Adjust workflows/step functions for copy_to_glacier.
    • multipart_chunksize_mb argument in task_config is now the Cumulus standard of s3MultipartChunksizeMb. See example below.
    • copy_to_glacier has new requirements for writing to the orca catalog. See example below. Required properties are providerId, executionId, collectionShortname, and collectionVersion. See example below.
"task_config": {
  "s3MultipartChunksizeMb": "{$.meta.collection.meta.s3MultipartChunksizeMb}",
  "excludeFileTypes": "{$.meta.collection.meta.excludeFileTypes}",
  "providerId": "{$.meta.provider.id}",
  "providerName": "{$.meta.provider.name}",
  "executionId": "{$.cumulus_meta.execution_name}",
  "collectionShortname": "{$.meta.collection.name}",
  "collectionVersion": "{$.meta.collection.version}",
  "orcaDefaultBucketOverride": "{$.meta.collection.meta.orcaDefaultBucketOverride}"
}
  • request_status_for_granule input/output and request_status_for_job input/output are now fully camel case.
  • Add the following ORCA required variables definition to your variables.tf or orca_variables.tf file.
variable "db_admin_password" {
  description = "Password for RDS database administrator authentication"
  type        = string
}

variable "db_user_password" {
  description = "Password for RDS database user authentication"
  type        = string
}

variable "db_host_endpoint" {
  type        = string
  description = "Database host endpoint to connect to."
}

variable "rds_security_group_id" {
  type        = string
  description = "Cumulus' RDS Security Group's ID."
}
  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.
    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v3.0.1/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    workflow_config          = module.cumulus.workflow_config
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    orca_default_bucket = var.orca_default_bucket
    db_admin_password   = var.db_admin_password
    db_user_password    = var.db_user_password
    db_host_endpoint    = var.db_host_endpoint
    rds_security_group_id    = var.rds_security_group_id
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

Removed

  • The modules/rds directory is removed since ORCA will utilize the Cumulus DB.
  • ORCA-233 The disaster_recovery database, now renamed PREFIX_orca, will now be created by db_deploy instead of Terraform.
  • ORCA-288 Removed copy_to_glacier_cumulus_translator due to better consistency in Cumulus's file dictionary.
  • ORCA-311 copy_to_glacier no longer accepts/returns file properties other than bucket and key.
    copied_to_glacier is similarly no longer passed through, but generated.

Added

  • ORCA-256 Added AWS API Gateway in modules/api_gateway/main.tf for the catalog reporting lambda.
  • ORCA-227 Added modules/secretsmanager directory that contains terraform code for deploying AWS secretsmanager.
  • ORCA-177 Added AWS API Gateway in modules/api_gateway/main.tf for the request_status_for_granule and request_status_for_job lambdas.
  • ORCA-257 orca_catalog_reporting lambda now returns data from actual catalog.
  • ORCA-151 copy_to_glacier and request_files now optionally accept "orcaDefaultBucketOverride" which can be used on a per-collection basis. If desired, add "orcaDefaultBucketOverride": "{$.meta.collection.meta.orcaDefaultBucketOverride}" to the workflow's task's task_config.
  • ORCA-335 request_files now recognizes when a file is already recovered, and posts an error message to status tables.
  • ORCA-230 copy_to_glacier now writes metadata to an ORCA catalog for comparisons to cumulus holdings.

Changed

  • ORCA-217 Lambda inputs now conform to the Cumulus camel case standard.
  • ORCA-297 Default database name is now PREFIX_orca
  • ORCA-287 Updated copy_to_glacier and extract_filepaths_for_granule to new Cumulus file format.
  • ORCA-245 Updated resource policies related to KMS keys to provide better security.
  • ORCA-318 Updated post_to_catalog lambda to match new Cumulus schema changes.
  • ORCA-317 Updated the db_deploy task, unit tests, manual tests, research pages and SQL to reflect new inventory layout to better align with Cumulus.
  • ORCA-249 Changed mutipart_chunksize_mb in lambda configs to s3MultipartChunksizeMb. Standard workflows now pull from $.meta.collection.meta.s3MultipartChunksizeMb
  • ORCA-230 Updated lambdas to use Cumulus Message Adapter Python v2.0.0.
  • ORCA-132 Updated workflows to use latest Cumulus v10.0.0 workflow code.

v4.0.0-Beta3

13 Jan 16:28
35b53aa
Compare
Choose a tag to compare

Release v4.0.0-Beta3

v4.0.0-Beta2

22 Dec 15:30
4935a47
Compare
Choose a tag to compare

Release v4.0.0-Beta2