Skip to content

Latest commit

 

History

History
1288 lines (1026 loc) · 72.3 KB

CHANGELOG.md

File metadata and controls

1288 lines (1026 loc) · 72.3 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and includes an additional section for migration notes.

  • Migration Notes - Notes for end users to migrate to the version.
  • Added - New features.
  • Changed - Changes in existing functionality.
  • Deprecated - Soon-to-be removed features.
  • Removed - Now removed features.
  • Fixed - Any bug fixes.
  • Security - Vulnerabilities fixes and changes.

[Unreleased]

Added

Changed

Removed

Fixed

Security

[10.1.0] 2024-12-13

Added

  • ORCA-905 - Added integration test for recovery large file.
  • ORCA-849 - Added optional fileDestinationOverride property in copyToArchive workflow that can be used to override the file destination key if desired.
  • ORCA-567 - Specified build scripts to use specific version of pip to resolve any future errors/issues that could be caused by using the latest version of pip.
  • ORCA-933 - Added dead letter queue for the Metadata SQS queue in modules/sqs/main.tf

Changed

  • ORCA-900 - Updated aws_lambda_powertools to latest version to resolve errors users were experiencing in older version. Updated boto3 as it is a dependecy of aws_lambda_powertools.
  • ORCA-927 - Updated archive architecture to include metadata deadletter queue in website/static/img/ORCA-Architecture-Archive-Container-Component-Updated.svg
  • ORCA-937 - Updated get_current_archive_list Lambda to use the gql_tasks_role to resolve database errors when trying to S3 import in modules/lambdas/main.tf. Updated gql_tasks_role with needed permissions in modules/graphql_0/main.tf, as well as updated Secrets Manager permissions to allow the role to get DB secret in modules/secretsmanager/main.tf.
  • ORCA-942 - Fixed npm tarball error found during ORCA website deployment.
  • ORCA-850 - Updated copy_to_archive documentation containing the additional s3 destination property functionality.
  • ORCA-774 - Updated Lambdas and GraphQL to Python 3.10.
  • ORCA-896 - Updated Bamboo files to use latest tag on cumulus_orca Docker image to resolve Bamboo jobs using old images.
  • 530 - Added explicit s3:GetObjectTagging and s3:PutObjectTagging actions to IAM restore_object_role_policy

Fixed

  • ORCA-822 - Fixed nodejs installation error in bamboo CI/CD ORCA distribution docker image.
  • ORCA-810 - Fixed db_deploy unit test error in bamboo due to wheel installation during python 3.10 upgrade.
  • ORCA-861 - Updated docusaurus to fix Snyk vulnerabilities.
  • ORCA-862 - Updated docusaurus to v3.4.0.
  • ORCA-890 - Fixed snyk vulnerabilities showing high issues and upgraded docusaurus to v3.5.2
  • ORCA-902 - Upgraded bandit to version 1.7.9 to fix snyk vulnerabilities.
  • ORCA-937 - Updated get_current_archive_list Lambda to use the gql_tasks_role to resolve database errors when trying to S3 import in modules/lambdas/main.tf. Updated gql_tasks_role with needed permissions in modules/graphql_0/main.tf, as well as updated Secrets Manager permissions to allow the role to get DB secret in modules/secretsmanager/main.tf.
  • ORCA-942 - Fixed npm tarball error found during ORCA website deployment.

Removed

  • ORCA-933 - Removed S3 credential references that were causing errors in tasks/get_current_archive_list/get_current_archive_list.py and tasks/get_current_archive_list/test/unit_tests/test_get_current_archive_list.py

[10.0.1] 2024-10-18

Added

  • ORCA-920 - Fixed ORCA deployment failure for Cumulus when sharing an RDS cluster due to multiple IAM role association attempts. Added a new boolean variable deploy_rds_cluster_role_association which can be used to deploy multiple ORCA/cumulus stacks sharing the same RDS cluster in the same account by overwriting it to false for 2nd user.

[10.0.0] 2024-10-02

Migration Notes

Remove the s3_access_key and s3_secret_key variables from your orca.tf file.

Post V2 Upgrade Comparison

Once the Aurora V1 database has been migrated/upgrade to Aurora V2 you can verify data integrity of the ORCA database by deploying the EC2 comparison instance which can be found at modules/db_compare_instance/main.tf

  • Deployment Steps
    1. Fill in the variables in modules/db_compare_instance/scripts/db_config.sh
      • archive_bucket - ORCA Archive Bucket Name IMPORTANT: use underscores in place of dashes e.g. zrtest_orca_archive
      • v1_endpoint - Endpoint of the V1 cluster e.g. orcaV1.cluster-c1xufm1sp0ux.us-west-2.rds.amazonaws.com
      • v1_database - Database of the V1 cluster e.g. orca_db
      • v1_user - Username of the V1 cluster e.g orcaV1_user
      • v1_password - Password for the V1 user e.g. OrcaDBPass_4
      • v2_endpoint - Endpoint of the V2 cluster e.g. orcaV2.cluster-c1xufm1sp0ux.us-west-2.rds.amazonaws.com
      • v2_database - Database of the V2 cluster e.g. orca_db2
      • v2_user - Username of the V2 cluster e.g orcaV2_user
      • v2_password - Password for the V2 user e.g. OrcaDB2Pass_9
    2. cd to modules/db_compare_instance
    3. Run terraform init
    4. Run terraform apply
    5. Once the instance is deployed add an inbound rule to both the V1 and V2 database security groups with the private IP of the EC2 instance.
      • The private IP of the instance can be found via the console or AWS CLI by running the command: aws ec2 describe-instances --filters "Name=instance-state-name,Values=running" "Name=instance-id,Values=<INSTANCE_ID>" --query 'Reservations[*].Instances[*].[PrivateIpAddress]' --output text
      • This needs to be performed on BOTH V1 and V2 Security Groups The inbound rule can be added via the AWS console or AWS CLI by running the command: aws ec2 authorize-security-group-ingress --group-id <DB_SECURITY_GROUP_ID> --protocol tcp --port 5432 --cidr <INSTANCE_PRIVATE_IP>/32
    6. Now you can connect to the EC2 via the AWS console or AWS CLI with the command: aws ssm start-session --target <INSTANCE_ID>
    7. Once connected run the command cd /home
    8. Once at the /home directory run the command: sh db_compare.sh
    9. When the script completes it will output two tables:
      • v1_cluster - This table is count of data in the ORCA database of each table in the V1 cluster.
      • v2_cluster - This table is count of data in the ORCA database of each table in the V2 cluster.
    10. Verify that the output of the V2 database matches that of the V1 database to ensure no data was lost during the migration.
    11. Once verified the EC2 instance can be destroyed by running terraform destroy Verify you are in the modules/db_compare_instance directory
    12. This needs to be performed on BOTH V1 and V2 Security Groups Remove the added inbound rules that were added in step 5 either in the AWS Console or AWS CLI by running the command: aws ec2 revoke-security-group-ingress --group-id <DB_SECURITY_GROUP_ID> --protocol tcp --port 5432 --cidr <INSTANCE_PRIVATE_IP>/32
    13. Delete the V1 database.
    14. Remove the snapshot identifier from the Terraform (If Applicable)
    15. In the AWS console navigate to RDS -> Snapshots and delete the snapshot the V2 database was restored from.

Added

  • ORCA-845 - Created IAM role for RDS S3 import needed for Aurora v2 upgrade.
  • ORCA-792 - Added DB comparison script at modules/db_compare_instance/scripts/db_compare.sh for the temporary EC2 to compare databases post migration.
  • ORCA-868 - Added EC2 instance for DB comparison after migration under modules/db_compare_instance/main.tf
  • ORCA-880 - Modified terraform to add an optional variable lambda_runtime for lambda runtime.
  • ORCA-128 - Added Scheduler module at cumulus-orca/modules/scheduler to shutdown specifically tagged resources such as EC2, RDS, ECS, Autoscaling Groups, Redshift, and DocumentDB.
  • ORCA-752 - Created Security Group for ORCA Lambdas that only need outbound access at modules/security_groups/main.tf

Changed

  • ORCA-832 - Modified psycopg2 installation to allow for SSL connections to database.
  • ORCA-795 - Modified Graphql task policy to allow for S3 imports.
  • ORCA-797 - Removed s3 credential variables from deployment-with-cumulus.md and s3-credentials.md documentations since they are no longer used in Aurora v2 DB.
  • ORCA-873 - Modified build task script to copy schemas into a schema folder to resolve errors.
  • ORCA-872 - Updated grapql version, modified policy in modules/iam/main.tf to resolve errors, and added DB role attachment to modules/graphql_0/main.tf

Deprecated

Removed

  • ORCA-793 - Removed s3_access_key and s3_secret_key variables from terraform.
  • ORCA-795 - Removed s3_access_key and s3_secret_key variables from Graphql code and from get_current_archive_list task.
  • ORCA-798 - Removed s3_access_key and s3_secret_key variables from integration tests.
  • ORCA-783 - Removed tasks/copy_to_archive_adapter and tasks/orca_recovery_adapter as they are handled by Cumulus.

Fixed

  • ORCA-835 - Fixed ORCA documentation bamboo CI/CD pipeline showing node package import errors.
  • ORCA-864 - Updated ORCA archive bucket policy and IAM role to fix access denied error during backup/recovery process.

Security

  • ORCA-851 - Updated bandit libraries to fix Snyk vulnerabilities.

[9.0.5] 2024-02-29

Migration Notes

Remove thes3:x-amz-acl: bucket-owner-full-control property from your ORCA archive bucket policy if applicable.

If you are deploying ORCA for the first time or migrating from v6, the changes stated below regarding the load balancer are not required.

If you are currently on v8 or v9, this means you already have load balancer deployed and you need to delete the load balancer target group before deploying this version. This is because terraform cannot delete existing load balancer target groups having a listener attached. Adding a HTTPS to the target group requires replacing the target group. Once the target group is deleted, you should be able to deploy ORCA.

  1. From AWS EC2 console, go to your load balancer named <prefix-gql-a> and select the Listeners and rules tab. Delete the rule.
  2. Delete your target group <random_name>-gql-a. The target group name has been randomized to avoid terraform resource error.
  3. Deploy ORCA.

If deployed correctly, the target group health checks should show as healthy.

Added

  • ORCA-450 - Removed Access Control List (ACL) requirement and added BucketOwnerEnforced to ORCA bucket objects.
  • ORCA-452 - Added Deny non SSL policy to S3 buckets in modules/dr_buckets/dr_buckets.tf and modules/dr_buckets_cloudformation/ dr-buckets.yaml

Changed

  • ORCA-441 - Updated policies for ORCA buckets and copy_to_archive to give them only the permissions needed to restrict unwanted/unintended actions.
  • ORCA-746 - Enabled HTTPS listener in application load balancer for GraphQL server using AWS Certificate Manager.
  • ORCA-828 - Added prefix to ORCA SNS topic names to avoid object already exists errors.

Deprecated

Removed

Fixed

Security

  • ORCA-821 - Fixed snyk vulnerabilities from snyk report showing high issues and upgraded docusaurus to v3.1.0.

[9.0.4] 2024-02-07

Migration Notes

  • For users upgrading from ORCA v8.x.x to v9.x.x, follow the below steps before deploying:
    1. Run the Lambda deletion script found in python3 bin/delete_lambda.py which will delete all of the ORCA lambdas with a provided prefix. You can also delete them manually in the AWS console.
    2. Navigate to the AWS console and search for the Cumulus RDS security group.
    3. Remove the inbound rule with the source of PREFIX-vpc-ingress-all-egress in Cumulus RDS security group.
    4. Search for PREFIX-vpc-ingress-all-egress and delete the security group NOTE: Due to the Lambdas using ENIs, when deleting the security groups it may say they are still associated with a Lambda that was deleted by the script. AWS may need a few minutes to refresh to fully disassociate the ENIs completely, if this error appears wait a few minutes and then try again.

Changed

  • ORCA-826 - Changed bin/delete_lambda.py to delete ORCA lambdas based on their tags.
  • ORCA-827 - Changed ORCA API gateway stage name from orca to orca_api to avoid confusion in the URL path. The new ORCA execute API URL will be https://<API_ID>.execute-api.<AWS_REGION>.amazonaws.com/orca_api.

Fixed

  • ORCA-827 Fixed API gateway URL not found issue seen in ORCA v9.0.3.

[9.0.3] 2024-01-30

Fixed

  • ORCA-823 Fixed security group errors of incorrect referenced resources in modules/security-groups/main.tf and modules/security-groups/outputs.tf. This fixes the deployment errors seen in ORCA v9.0.2.

[9.0.2] 2024-01-26

Migration Notes

Added

  • ORCA-366 Added unit test for shared libraries.
  • ORCA-769 Added API Gateway Stage resource to modules/api-gateway/main.tf
  • ORCA-369 Added DR S3 bucket template to modules/dr_buckets/dr_buckets.tf and updated S3 deployment documentation with steps.

Changed

  • ORCA-784 Changed documentation to replace restore with copy based on task's naming as well as changed file name from website/docs/operator/restore-to-orca.mdx to website/docs/operator/reingest-to-orca.mdx.
  • ORCA-724 Updated ORCA recovery documentation to include recovery workflow process and relevant inputs and outputs in website/docs/operator/data-recovery.md.
  • ORCA-789 Updated extract_filepaths_for_granule to more flexibly match file-regex values to keys.
  • ORCA-787 Modified modules/api-gateway/main.tf api gateway stage name to remove the extra orca from the data management URL path
  • ORCA-805 Changed modules/security_groups/main.tf security group resource name from vpc_postgres_ingress_all_egress to vpc-postgres-ingress-all-egress to resolve errors when upgrading from ORCA v8 to v9. Also removed graphql_1 dependency module.orca_lambdas since this module does not depend on the lambda module in modules/orca/main.tf

Deprecated

Removed

  • ORCA-361 Removed hardcoded test values from extract_file_paths_for_granule unit tests.
  • ORCA-710 Removed duplicate logging messages in integration_test/workflow_tests/custom_logger.py
  • ORCA-815 Removed steps for creating buckets using NGAP form in ORCA archive bucket documentation.

Fixed

  • ORCA-811 Fixed cumulus_orca docker image by updating nodejs installation process.
  • ORCA-802 Fixed extract_file_for_granule documentation and schemas to include collectionId in input.
  • ORCA-785 Fixed checksum integrity issue in ORCA documentation bamboo pipeline.
  • ORCA-820 Updated bandit and moto libraries to fix some snyk vulnerabilities.

Security

[9.0.1] 2023-11-16

Added

  • ORCA-766 Created AWS cloudformation template that can be used to deploy ORCA DR buckets.
  • ORCA-765 Updated ORCA "Creating the Glacier Bucket" documentation with instructions to deploy ORCA DR buckets using cloudformation.

Changed

  • ORCA-780 Updated ORCA "Deployment with Cumulus" documentation with instructions and examples to run ORCA recovery and archive workflows.
  • ORCA-704 Updated dr-buckets.tf.template and buckets.tf.template with provider block to deploy in the us-west-2 region due to deployments failing in the other regions.
  • ORCA-708 Updated integration_test/shared/setup-orca.sh script to use the root folder instead of cloning in a duplicate repository.

Fixed

  • ORCA-731 Updated boto3 library used for unit tests to version 1.28.76 from version 1.18.40 to fix unit test warnings.
  • ORCA-722 Fixed multiple granules happy path integration tests by randomizing large file name to avoid duplicate data being ingested.

Security

  • ORCA-778 Upgraded Docusaurus to version 2.4.3 to fix snyk vulnerabilities and security issues.
  • ORCA-737 Updated moto library used for unit tests to version 4.2.2 from version 2.0.

[9.0.0] 2023-10-05

Migration Notes

  • Update terraform to the latest 1.5 version

Security

  • ORCA-729 Updated terraform provider to use the latest version 1.5
  • ORCA-713 Updated terraform, Dockerfile, and other IAC elements for best practices and security where able.

[8.1.0] 2023-08-02

Added

  • ORCA-679 Updated area in recovery where granule ID was treated as a globally unique key. Per Cumulus updates, uniqueness is now granule ID plus collection ID.
    • ORCA-678 collection_id column added to recovery status tables.
    • ORCA-683 collectionId added to Recovery Job status output.
    • ORCA-684 collectionId added to Recovery Granule status input and output.
    • ORCA-672, ORCA-671 collectionId added as input to extract_filepaths_for_granule, request_from_archive, and the recovery workflow.
  • ORCA-700 Added variable aws_region to Terraform variables.

Changed

  • ORCA-700 Removed Cumulus Workflow wrapper from step-functions. No anticipated customer impact.
  • ORCA-709 Updated terraform AWS provider to version 5. This is to support Cumulus and CIRRUS changes.
  • ORCA-714 Fixed new deployment errors with API Gateway by adding an IAM policy and tying it to the GW.
  • ORCA-716 Fixed Deployment issues with GraphQL tasks by adding permission and health check.
  • ORCA-726 Updated Docusaurus and Node version to latest LTS releases to fix security issues.

Migration Notes

  • Changes have been made to SQS message processing that are not backwards compatible. Halt ingest and wait for the PREFIX-orca-status-update-queue.fifo queue to empty before applying update.
    • If the queue is stuck or becomes stuck, it may be necessary to flush the queue and its associated Dead Letter Queue.
  • The input format of the ORCA Recovery Workflow step-function has been modified. If accessing these resources outside of a Cumulus perspective, go to orca_recover_workflow.asl.json and look at config elements to see the new paths. Additionally, add a collectionId property to each granule passed in.
  • collectionId properties have been added to Recovery Jobs and Recovery Granules API.
    • For Recovery Jobs, it is only added to output.
    • For Recovery Granules, it is now required on input and will be returned on output.
  • Update the orca.tf file to include aws_region. See example below.
    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v6.0.0/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    aws_region               = var.region
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    db_admin_password        = var.db_admin_password
    db_user_password         = var.db_user_password
    db_host_endpoint         = var.db_host_endpoint
    dlq_subscription_email   = var.dlq_subscription_email
    orca_default_bucket      = var.orca_default_bucket
    orca_reports_bucket_name = var.orca_reports_bucket_name
    rds_security_group_id    = var.rds_security_group_id
    s3_access_key            = var.s3_access_key
    s3_secret_key            = var.s3_secret_key
    
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    internal_report_queue_message_retention_time_seconds = 432000
    orca_default_recovery_type                           = "Standard"
    orca_default_storage_class                           = "GLACIER"
    orca_delete_old_reconcile_jobs_frequency_cron        = "cron(0 0 ? * SUN *)"
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_internal_reconciliation_expiration_days         = 30
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    s3_inventory_queue_message_retention_time_seconds    = 432000
    s3_report_frequency                                  = "Daily"
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

[8.0.1] 2023-06-07

Added

  • ORCA-693 Fixed sqlalchemy query issue in orca_catalog_reporting lambda.

[8.0.0] 2023-05-24

Added

  • ORCA-554, ORCA-561, ORCA-579, ORCA-581
    • GraphQL image, service, and Load Balancer will now be deployed by TF.
    • ORCA-557 Added orca_graphql_load_balancer_dns_name to output variables for GraphQL integration.
    • ORCA-420 Added Internal Reconcile Report Mismatch functionality to GraphQL.
    • ORCA-556 Added Internal Reconcile Report Phantom functionality to GraphQL.
    • ORCA-592 GraphQL logs are json structures, and can thus be queried in CloudWatch.
    • ORCA-622 Added support for integer sizes up to 8 bytes.
  • ORCA-597
    • Server access logging is now enabled for graphql application load balancer.
  • ORCA-614, ORCA-428 Moved some Internal Reconciliation functionality to GraphQL

Changed

  • ORCA-573 Updated ORCA DB user password to now have a stronger password requirement. See migration notes for details.
  • ORCA-520 Removed run_cumulus_task function from copy_to_archive to decouple ORCA from Cumulus.
  • ORCA-647 Upgraded sqlalchemy from v1.4.11 to v2.0.5.

Migration Notes

  • Remove the workflow_config variable from orca.tf otherwise terraform deployment will throw an error.

  • The output format of copy_to_archive lambda and step-function has been simplified. If accessing these resources outside of a Cumulus perspective, instead of accessing output["payload"]["granules"] you now use output["granules"].

  • Due to Cumulus-ORCA decoupling efforts, users will now need to update the existing CopyToArchive workflow configuration to point to Cumulus copy_to_archive_adapter lambda which then runs our copy_to_archive lambda. See deployment documentation for details.

  • Due to Cumulus-ORCA decoupling efforts, users will now need to deploy a recovery_workflow_adapter workflow that triggers the Cumulusrecovery_adapter lambda which then runs our existing orca recovery workflow. See deployment documentation for details.

  • Update the bucket policy for your system-bucket to allow load balancer to post server access logs to the bucket. See the instructions here.

  • InternalReconcileReport Phantom and Mismatch reports are now available via GraphQL.

    • API Gateway access is now deprecated, and will be removed in a future update.
    • Use the orca_graphql_load_balancer_dns_name variable to send your queries to GraphQL as json strings in a POST request.
  • Users will need to update their orca-user password. The password must have the following requirements otherwise the db_deploy lambda will fail during deployment.

    • one upper case letter
    • one lower case letter
    • one digit
    • one special character
    • minimum length of 12

    Update db_user_password variable in your cumulus-tf/terraform.tfvars file to match the new password requirement and then run terraform. db_deploy lambda will automatically update your new password.

[7.0.1] 2023-02-13

Changed

  • ORCA-632 Fixed a bug where excludedFileExtensions was a required property in collection config. Restored default behavior of defaulting to an empty list.

[7.0.0] 2021-01-06

Changed

  • ORCA-336
    • request_from_archive lambda now posts to the new SQS for files that have already been recovered from glacier instead of throwing an error.
    • post_copy_request_to_queue lambda now receives event messages of files recovered from archive from the new archive recovery SQS instead of archive bucket.
  • ORCA-522
    • Removed run_cumulus_task function from extract_filepath_for_granule lambda to decouple ORCA from Cumulus.
  • ORCA-575
    • Removed run_cumulus_task function from request_from_archive lambda to decouple ORCA from Cumulus.
  • ORCA-521
    • Replaced CumulusLogger with AWS powertools logger in all of the lambdas currently present in ORCA.
  • ORCA-537
    • Renamed step-function OrcaCopyToGlacierWorkflow to OrcaCopyToArchiveWorkflow.
    • Renamed lambda PREFIX_copy_to_glacier to PREFIX_copy_to_orca. Renamed ORCA repository internal task from copy_to_glacier to copy_to_archive. Output of lambda and Terraform updated to match. See Migration Notes below.
  • ORCA-540
    • Renamed lambda copy_files_to_archive to copy_from_archive.
    • Output of Terraform updated to match. Unlikely to affect any integrations.
  • ORCA-539
    • Renamed lambda request_files to request_from_archive.
    • Output of Terraform updated to match. Unlikely to affect any integrations.
  • ORCA-534
    • extract_filepaths_for_granule now raises a descriptive error when no destination bucket (destBucket) is found in fileBucketMaps for a given file. Previously was a general JsonSchemaException. Now is a ExtractFilePathsError with a description of which file could not be placed.
    • extract_filepaths_for_granule now takes the first match in fileBucketMaps instead of the last.
  • ORCA-461
    • Invalid database connection parameters will now be detected earlier and more consistently.
    • Postgres table/user names can now begin with an '_' and contain '$' if your Postgres DB version supports this.
  • ORCA-533 RecoveryWorkflow no longer requires the bucket property on files. Was unused by ORCA.

Added

  • ORCA-336
    • Added a new standard SQS between archive ORCA bucket and post_copy_request_to_queue lambda so that the bucket now triggers the SQS upon successful object retrieval from glacier.
  • ORCA-351
    • Added new optional recoveryBucketOverride property to extract_filepaths_for_granule input schema so that data managers can now specify their own buckets for recovery if desired.
  • ORCA-574/580 Added additional logging to the extract_filepaths_for_granule and request_from_archive steps of the recovery workflow to identify when an input granule is entirely excluded, or otherwise has no files to request. Status entries for these granules will display an ERROR status.

Migration Notes

  • If utilizing the copied_to_glacier output property of copy_to_glacier, rename to new key copied_to_orca.
  • If utilizing the orca_lambda_copy_to_glacier_arn output of Terraform, likely as a means of pulling the lambda into your workflows, rename to new key orca_lambda_copy_to_archive_arn
  • If utilizing the orca_lambda_request_files_arn output of Terraform, likely as a means of pulling the lambda into your workflows, rename to new key orca_lambda_request_from_archive_arn
  • If desired, use the optional recoveryBucketOverride property in extract_filepaths_for_granule input schema to override the default recovery bucket. See example below.
    {
      "input":
        {
          "granules": [
            {
              "granuleId": "MOD09GQ.A0219114.N5aUCG.006.0656338553321",
              "recoveryBucketOverride": "<YOUR_RECOVERY_BUCKET>",
              "files": [
                {
                  "key": "MOD09GQ___006/2017/MOD/MOD09GQ.A0219114.N5aUCG.006.  0656338553321.h5",
                  "bucket": "cumulus-test-sandbox-protected",
                  "fileName": "MOD09GQ.A0219114.N5aUCG.006.0656338553321.h5",
                }
              ]
            }
          ]
      }
    }
  • If utilizing the output of the OrcaRecoveryWorkflow, adjust to the simplified output schema. See example below:
    {
        "granules": [
        {
          "granuleId": "integrationGranuleId",
          "keys": [
            {
              "key": "PODAAC/SWOT/ancillary_data_input_forcing_ECCO_V4r4.tar.gz",
              "destBucket": "PREFIX-public"
            }
          ],
          "recoverFiles": [
            {
              "success": true,
              "filename": "ancillary_data_input_forcing_ECCO_V4r4.tar.gz",
              "keyPath": "PODAAC/SWOT/ancillary_data_input_forcing_ECCO_V4r4.tar.  gz",
              "restoreDestination": "PREFIX-public",
              "s3MultipartChunksizeMb": null,
              "statusId": 1,
              "requestTime": "2023-02-10T21:06:13.071287+00:00",
              "lastUpdate": "2023-02-10T21:06:13.071287+00:00"
            }
          ]
        }
      ],
      "asyncOperationId": "770a85f2-f933-4440-90b5-1a8039557538"
    }

[6.0.4] 2023-09-26

Fixed

  • ORCA-738 Update cumulus-process to v1.2.0 and cumulus-message-adapter-python to v2.1.0. To alleviate potential issues related to timeouts when using CMA calls.

[6.0.3] 2023-02-28

Changed

  • ORCA-643 Reverted ORCA-437, which introduced IAM authentication for API Gateway endpoints.

Migration Notes

  • If you installed 6.x without an ORCA base or updated from an ORCA version earlier than 5.1.0, you may be seeing Missing Authentication Token errors when contacting the ORCA API for recovery and reconciliation information. After deploying this version, open your API Gateway in AWS and click Actions -> Deploy API -> Deployment stage = orca -> Deploy.
    • If you do not see these errors when requesting recovery status, then no action is required.

[6.0.2] 2022-10-18

Changed

  • ORCA-570 Fixed an error that could prevent deployment of the database on fresh installations.

[6.0.1] 2022-10-12

Changed

  • ORCA-566 Shortened S3 inventory report name due to length limitation causing errors when a user's naming schema is long.

[6.0.0] 2022-09-15

Changed

  • ORCA-290 Renamed excludeFileTypes, orcaDefaultBucketOverride, orcaDefaultRecoveryTypeOverride, and orcaDefaultStorageClassOverride to excludedFileExtensions, defaultBucketOverride, defaultRecoveryTypeOverride, and defaultStorageClassOverride respectively. In addition, ORCA configuration variables excludedFileExtensions, defaultBucketOverride, defaultRecoveryTypeOverride, and defaultStorageClassOverride are now under collection.meta.orca.
  • ORCA-290 Adjusted workflows/step functions for OrcaRecoveryWorkflow.
    • excludeFileTypes, orcaDefaultBucketOverride and orcaDefaultStorageClassOverride arguments in task_config are now excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride respectively.
    • excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride keys are now under collection.meta.orca. See the example below under Migration Notes.
  • ORCA-519 Enforced schema checks in request_status_for_granule and request_status_for_job. Both lambdas will return proper HTTP error codes for bad inputs of internal server errors. Additionally, corrected error in API Reference where the error status for these lambdas was incorrectly listed as failed.
  • ORCA-437 Requests to API Gateway now use IAM permissions, restricting anonymous access.
  • ORCA-496 Mitigated SQS security issue. All SQS queues now use default encryption.

Migration Notes

  • Adjust usage of copy_to_glacier in your step functions for new keys.
    • excludeFileTypes, orcaDefaultBucketOverride, and orcaDefaultStorageClassOverride arguments are now excludedFileExtensions, defaultBucketOverride, and defaultStorageClassOverride and are under a new key orca. See example below:
      "task_config": {
        "excludedFileExtensions": "{$.meta.collection.meta.orca.excludedFileExtensions}",
        "defaultBucketOverride": "{$.meta.collection.meta.orca.defaultBucketOverride}",
        "defaultStorageClassOverride": "{$.meta.collection.meta.orca.defaultStorageClassOverride}"
      }
  • Adjust Cumulus collection configuration integration for new orca key paths.
    • excludeFileTypes, orcaDefaultBucketOverride and orcaDefaultStorageClassOverride arguments are now excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride respectively.
    • excludedFileExtensions, defaultBucketOverride and defaultStorageClassOverride keys are now under a new key orca. See example below:
        "collection": {
            "meta":{
                "orca": {
                  "defaultStorageClassOverride": "DEEP_ARCHIVE",
                  "excludedFileExtensions": [".xml"],
                  "defaultBucketOverride": "orca-bucket"
              }
          }
        }

[5.1.0] 2022-08-11

Changed

  • ORCA-359 Updated Python version from 3.7 to 3.9.
  • ORCA-478 Updated bucket policy documentation for deep glacier bucket in DR account so that the users now can only upload objects with storage type as either GLACIER or DEEP_ARCHIVE.
  • ORCA-457 RequestFiles will now raise a descriptive error when user attempts to recover DEEP_ARCHIVE files with the Expedited recovery method. For more details on storageClass see the Orca storageClass documentation.

Added

  • ORCA-480 Added storageClass to Orca catalog and associated reporting API. Existing entries will be reported as in the GLACIER storage class.
  • ORCA-479 Added variable orca_default_storage_class which denotes the default storage class to use when storing files in Orca. Currently allowed values are GLACIER and DEEP_ARCHIVE copy_to_glacier accepts orcaDefaultStorageClassOverride which can be used on a per-collection basis. If desired, add "orcaDefaultStorageClassOverride": "{$.meta.collection.meta.orcaDefaultStorageClassOverride} to the workflow's task's task_config.
  • ORCA-458 Added storage_class to internal reconciliation. See reporting API for retrieval via reporting lambdas.

Migration Notes

  • The user should update their orca.tf, variables.tf and terraform.tfvars files with new variables. The following optional variables have been added:

    • orca_default_storage_class
  • If desired, update collection configurations with the new optional key orcaDefaultStorageClassOverride that can be added to override the default S3 glacier recovery type as shown below.

      "meta": {
        "orcaDefaultStorageClassOverride": "DEEP_ARCHIVE"
      }

    For more information on storage classes and their impact on available recovery options, see the Orca storageClass documentation.

  • Add the following rule to the existing glacier archive bucket policy under Condition key:

    "s3:x-amz-storage-class": ["GLACIER", "DEEP_ARCHIVE"]

    See this policy example for details.

  • The property storageClass returned by the Orphan reporting lambda has been renamed to s3StorageClass.

  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.

    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v6.0.0/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    workflow_config          = module.cumulus.workflow_config
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    db_admin_password        = var.db_admin_password
    db_user_password         = var.db_user_password
    db_host_endpoint         = var.db_host_endpoint
    dlq_subscription_email   = var.dlq_subscription_email
    orca_default_bucket      = var.orca_default_bucket
    orca_reports_bucket_name = var.orca_reports_bucket_name
    rds_security_group_id    = var.rds_security_group_id
    s3_access_key            = var.s3_access_key
    s3_secret_key            = var.s3_secret_key
    
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    internal_report_queue_message_retention_time_seconds = 432000
    orca_default_recovery_type                           = "Standard"
    orca_default_storage_class                           = "GLACIER"
    orca_delete_old_reconcile_jobs_frequency_cron        = "cron(0 0 ? * SUN *)"
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_internal_reconciliation_expiration_days         = 30
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    s3_inventory_queue_message_retention_time_seconds    = 432000
    s3_report_frequency                                  = "Daily"
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

[5.0.0] 2022-06-17

Added

  • ORCA-300 Added OrcaInternalReconciliation workflow along with an accompanying input queue and dead-letter queue. Retention time can be changed by setting internal_report_queue_message_retention_time_seconds in your variables.tf or orca_variables.tf file. Defaults to 432000.
  • ORCA-161 Added dead letter queue and cloudwatch alarm terraform code to recovery SQS queue.
  • ORCA-307 Added lambda get_current_archive_list to pull S3 Inventory reports into Postgres. Adds orca_reconciliation_lambda_memory_size and orca_reconciliation_lambda_timeout to Terraform variables.
  • ORCA-308 Added lambda perform_orca_reconcile to find differences between S3 Inventory reports and Orca catalog.
  • ORCA-403 Added lambda post_to_queue_and_trigger_step_function to trigger step function for internal reconciliation.
  • ORCA-373 Added input variable for orca_reports_bucket_name. Set in your variables.tf or orca_variables.tf file as shown below. Report frequency defaults to Daily, but can be set to Weekly through variable s3_report_frequency.
  • ORCA-309 Added lambda internal_reconcile_report_phantom to report entries present in the catalog, but not s3.
  • ORCA-382 Added lambda internal_reconcile_report_orphan to report entries present in S3 bucket, but not in the ORCA catalog.
  • ORCA-291 request_files lambda now accepts orcaDefaultRecoveryTypeOverride to override the glacier restore type at the workflow level by adding it to task_config.
  • ORCA-381 Added lambda internal_reconcile_report_mismatch to report entries present in S3 bucket and catalog, but with conflicting data.
  • ORCA-310 Added lambda delete_old_reconcile_jobs for removing old reconciliation reports from the database. Use new optional variable orca_internal_reconciliation_expiration_days to set the retention period.
  • ORCA-372 Added automatic trigger for inventory events being read in by post_to_queue_and_trigger_step_function.
  • ORCA-306 Added API gateway resources for internal reconciliation reporting lambdas.
  • ORCA-424 Added automatic trigger for delete_old_reconcile_jobs. Will run every sunday at midnight UTC. Adjust with the new optional variable orca_delete_old_reconcile_jobs_frequency_cron
  • ORCA-468 Added status_update_dlq to prevent ingest lock-down when theoretical errors occur.

Changed

  • ORCA-299 db_deploy task has been updated to deploy ORCA internal reconciliation tables and objects.
  • ORCA-161 Changed staged recovery SQS queue type from FIFO to standard queue.
  • SQS Queue names adjusted to include Orca. For example: "${var.prefix}-orca-status-update-queue.fifo". Queues will be automatically recreated by Terraform.
  • ORCA-334 Created IAM role for the extract_filepaths_for_granule lambda function, attached the role to the function
  • ORCA-404 Updated shared_db and relevant lambdas to use secrets manager ARN instead of magic strings.
  • ORCA-291 Updated request_files lambda and terraform so that the glacier restore type can be set via terraform during deployment. In addition, the glacier retrieval type can now be overridden via a change in the collections configuration using orcaDefaultRecoveryTypeOverride key under meta tag as shown below.
    "meta": {
      "orcaDefaultRecoveryTypeOverride": "Standard"
    }
  • ORCA-426 Performance improvements around json schema validators.

Migration Notes

  • Create a new bucket PREFIX-orca-reports in the same account and region as your primary orca bucket.

  • The user should update their orca.tf, variables.tf and terraform.tfvars files with new variables. The following required variables have been added:

    • dlq_subscription_email
    • orca_reports_bucket_name
    • s3_access_key
    • s3_secret_key
  • Update the collection configuration with the new optional key orcaDefaultRecoveryTypeOverride that can be added to override the default S3 glacier recovery type as shown below.

      "meta": {
        "orcaDefaultRecoveryTypeOverride": "Standard"
      }
  • Add the following ORCA required variable definition to your variables.tf or orca_variables.tf file.

variable "dlq_subscription_email" {
  type        = string
  description = "The email to notify users when messages are received in dead letter SQS queue due to restore failure. Sends one email until the dead letter queue is emptied."
}

variable "orca_reports_bucket_name" {
  type        = string
  description = "The name of the bucket to store s3 inventory reports."
}

variable "s3_access_key" {
  type        = string
  description = "Access key for communicating with Orca S3 buckets."
}

variable "s3_secret_key" {
  type        = string
  description = "Secret key for communicating with Orca S3 buckets."
}
  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.
    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v5.0.0/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    workflow_config          = module.cumulus.workflow_config
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    db_admin_password        = var.db_admin_password
    db_user_password         = var.db_user_password
    db_host_endpoint         = var.db_host_endpoint
    dlq_subscription_email   = var.dlq_subscription_email
    orca_default_bucket      = var.orca_default_bucket
    orca_reports_bucket_name = var.orca_reports_bucket_name
    rds_security_group_id    = var.rds_security_group_id
    s3_access_key            = var.s3_access_key
    s3_secret_key            = var.s3_secret_key
    
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    internal_report_queue_message_retention_time_seconds = 432000
    orca_default_recovery_type                           = "Standard"
    orca_delete_old_reconcile_jobs_frequency_cron        = "cron(0 0 ? * SUN *)"
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_internal_reconciliation_expiration_days         = 30
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    s3_inventory_queue_message_retention_time_seconds    = 432000
    s3_report_frequency                                  = "Daily"
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

Security

  • Updated Docusaurus to version 2.0.0.beta-21 to resolve security issues.

[4.0.3] 2022-06-02

Fixed

  • Fixed bug where db_admin_username had to be lower-case.

[4.0.2] 2022-05-18

Fixed

  • Fixed bug where db_admin_username was not set as the owner of new databases.

[4.0.1] 2022-02-16

Fixed

  • Updated release build script to perform cleanup sooner.
  • Updated terraform deployment with additional depends_on parameters and fixes to prevent db_deploy lambda from firing prematurely.

[4.0.0] 2022-02-15

Removed

  • The modules/rds directory is removed since ORCA will utilize the Cumulus DB.
  • ORCA-233 The disaster_recovery database, now renamed PREFIX_orca, will now be created by db_deploy instead of Terraform.
  • ORCA-288 Removed copy_to_glacier_cumulus_translator due to better consistency in Cumulus's file dictionary.
  • ORCA-311 copy_to_glacier no longer accepts/returns file properties other than bucket and key. copied_to_glacier is similarly no longer passed through, but generated.

Added

  • ORCA-256 Added AWS API Gateway in modules/api_gateway/main.tf for the catalog reporting lambda.
  • ORCA-227 Added modules/secretsmanager directory that contains terraform code for deploying AWS secretsmanager.
  • ORCA-177 Added AWS API Gateway in modules/api_gateway/main.tf for the request_status_for_granule and request_status_for_job lambdas.
  • ORCA-257 orca_catalog_reporting lambda now returns data from actual catalog.
  • ORCA-151 copy_to_glacier and request_files now accept "orcaDefaultBucketOverride" which can be used on a per-collection basis. If desired, add "orcaDefaultBucketOverride": "{$.meta.collection.meta.orcaDefaultBucketOverride}" to the workflow's task's task_config.
  • ORCA-335 request_files now recognizes when a file is already recovered, and posts an error message to status tables.
  • ORCA-230 copy_to_glacier now writes metadata to an ORCA catalog for comparisons to cumulus holdings.

Changed

  • ORCA-217 Lambda inputs now conform to the Cumulus camel case standard.
  • ORCA-297 Default database name is now PREFIX_orca
  • ORCA-287 Updated copy_to_glacier and extract_filepaths_for_granule to new Cumulus file format.
  • ORCA-245 Updated resource policies related to KMS keys to provide better security.
  • ORCA-318 Updated post_to_catalog lambda to match new Cumulus schema changes.
  • ORCA-317 Updated the db_deploy task, unit tests, manual tests, research pages and SQL to reflect new inventory layout to better align with Cumulus.
  • ORCA-249 Changed mutipart_chunksize_mb in lambda configs to s3MultipartChunksizeMb. Standard workflows now pull from $.meta.collection.meta.s3MultipartChunksizeMb
  • ORCA-230 Updated lambdas to use Cumulus Message Adapter Python v2.0.0.
  • ORCA-132 Updated workflows to use latest Cumulus v10.0.0 workflow code.

Migration Notes

  • Orca is only compatible with versions of Cumulus that use the new Cumulus file format. Any calls to extract_filepaths_for_granule or copy_to_glacier should switch to the new format.
  • Ensure that anything calling copy_to_glacier only relies on properties currently present in copy_to_glacier/schemas/output.json
  • Remove any added references in your setup to copy_to_glacier_cumulus_translator. It is no longer necesarry as a Cumulus intermediary.
  • The user should update their orca.tf, variables.tf and terraform.tfvars files with new variables. The following two variable names have changed:
    • postgres_user_pw-> db_admin_password (new)
    • database_app_user_pw-> db_user_password (new)
  • These are the new variables added:
    • db_admin_username (defaults to "postgres")
    • db_host_endpoint (Requires a value. Set in terraform.tfvars to your RDS Database's endpoint, similar to "PREFIX-cumulus-db.cluster-000000000000.us-west-2.rds.amazonaws.com")
    • db_name (Defaults to PREFIX_orca.)
      • Any - in prefix are replaced with _ to follow SQL Naming Conventions
      • If preserving a database from a previous version of Orca, set to disaster_recovery.
    • db_user_name (Defaults to PREFIX_orcauser.)
      • Any - in prefix are replaced with _ to follow SQL Naming Conventions
      • If preserving a database from a previous version of Orca, set to orcauser.
    • rds_security_group_id (Requires a value. Set in terraform.tfvars to the Security Group ID of your RDS Database's Security Group. Output from Cumulus' RDS module as security_group_id)
    • vpc_endpoint_id
  • Adjust workflows/step functions for extract_filepaths.
    • file-buckets argument in task_config is now fileBucketMaps.
  • Adjust workflows/step functions for copy_to_glacier.
    • multipart_chunksize_mb argument in task_config is now the Cumulus standard of s3MultipartChunksizeMb. See example below.
    • copy_to_glacier has new requirements for writing to the orca catalog. See example below. Required properties are providerId, executionId, collectionShortname, and collectionVersion. See example below.
"task_config": {
  "s3MultipartChunksizeMb": "{$.meta.collection.meta.s3MultipartChunksizeMb}",
  "excludeFileTypes": "{$.meta.collection.meta.excludeFileTypes}",
  "providerId": "{$.meta.provider.id}",
  "providerName": "{$.meta.provider.name}",
  "executionId": "{$.cumulus_meta.execution_name}",
  "collectionShortname": "{$.meta.collection.name}",
  "collectionVersion": "{$.meta.collection.version}",
  "orcaDefaultBucketOverride": "{$.meta.collection.meta.orcaDefaultBucketOverride}"
}
  • request_status_for_granule input/output and request_status_for_job input/output are now fully camel case.
  • Add the following ORCA required variables definition to your variables.tf or orca_variables.tf file.
variable "db_admin_password" {
  description = "Password for RDS database administrator authentication"
  type        = string
}

variable "db_user_password" {
  description = "Password for RDS database user authentication"
  type        = string
}

variable "db_host_endpoint" {
  type        = string
  description = "Database host endpoint to connect to."
}

variable "rds_security_group_id" {
  type        = string
  description = "Cumulus' RDS Security Group's ID."
}
  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.
    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v4.0.0/cumulus-orca-terraform.zip//modules"
    ## --------------------------
    ## Cumulus Variables
    ## --------------------------
    ## REQUIRED
    buckets                  = var.buckets
    lambda_subnet_ids        = var.lambda_subnet_ids
    permissions_boundary_arn = var.permissions_boundary_arn
    prefix                   = var.prefix
    system_bucket            = var.system_bucket
    vpc_id                   = var.vpc_id
    workflow_config          = module.cumulus.workflow_config
    
    ## OPTIONAL
    tags        = local.tags
    
    ## --------------------------
    ## ORCA Variables
    ## --------------------------
    ## REQUIRED
    orca_default_bucket = var.orca_default_bucket
    db_admin_password   = var.db_admin_password
    db_user_password    = var.db_user_password
    db_host_endpoint    = var.db_host_endpoint
    rds_security_group_id    = var.rds_security_group_id
    ## OPTIONAL
    db_admin_username                                    = "postgres"
    default_multipart_chunksize_mb                       = 250
    orca_ingest_lambda_memory_size                       = 2240
    orca_ingest_lambda_timeout                           = 720
    orca_recovery_buckets                                = []
    orca_recovery_complete_filter_prefix                 = ""
    orca_recovery_expiration_days                        = 5
    orca_recovery_lambda_memory_size                     = 128
    orca_recovery_lambda_timeout                         = 720
    orca_recovery_retry_limit                            = 3
    orca_recovery_retry_interval                         = 1
    orca_recovery_retry_backoff                          = 2
    sqs_delay_time_seconds                               = 0
    sqs_maximum_message_size                             = 262144
    staged_recovery_queue_message_retention_time_seconds = 432000
    status_update_queue_message_retention_time_seconds   = 777600
    vpc_endpoint_id                                      = null
    }

[3.0.2] 2021-10-14

Migration Notes

The configuration schema for copy_to_glacier has changed. See the updated schema definition here. Additional optional configuration settings like multipart_chunksize_mb can be found for copy_to_glacier and ORCA recovery in the ORCA documentation here.

Added

  • ORCA-244 Added schema files for copy_to_glacier. Errors for improperly formatted requests will look different.
  • ORCA-246 Added TF variable default_multipart_chunksize_mb which adjusts the maximum chunksize when copying files. Defaults to 250. Can be overridden by multipart_chunksize_mb within config['collection']. default_multipart_chunksize_mb can be overridden in your orca.tf with the line default_multipart_chunksize_mb = 500

Fixed

  • ORCA-248 excludeFileTypes is no longer required, as intended.
  • ORCA-205 Fixed installation and usage of orca_shared libraries.

[v3.0.1] 2021-08-31

Migration Notes

  • database_app_user, database_name, and orca_recovery_retrieval_type are no longer variables. If you have set these values, remove them.

Removed

  • ORCA-240 Removed development-only variables from variables.tf
  • ORCA-243 Removed aws_profile and region variables from variables.tf

Fixed

  • ORCA-199 Standardized build and test scripts for remaining ORCA lambdas
  • ORCA-236 Removed aws_profile and region variables as requirements for ORCA deployment.
  • ORCA-238 Moved all terraform requirements to a single versions.tf file as part of the deployments.
  • ORCA-239 Removed terraform provider block from all ORCA files and consolidated to main.tf file.
  • Removed technical debt and fixed recovery bug where bucket keys that were not the standard (internal, public, private, etc.) were being ignored.

Changed

  • ORCA-237 Updated node requirement versions to fix known security vulnerabilities.

[v3.0.0] 2021-07-12

Migration Notes

See the documentation for specifics on the various files and changes specified below.

  • Update the buckets variable in terraform.tfvars. The ORCA bucket previously defined should now have a type of orca.
    # OLD Setting
    buckets = {
      internal = {
        name = "my-internal-bucket",
        type = "internal"
      },
      ...
      glacier = {
        name = "my-orca-bucket",
        type = "glacier"
      }
    }
    
    # NEW Setting
    buckets = {
      internal = {
        name = "my-internal-bucket",
        type = "internal"
      },
      ...
      glacier = {
        name = "my-orca-bucket",
        type = "orca"
      }
    }
    
  • Add the following ORCA required variable definition to your variables.tf or orca_variables.tf file.
    variable "orca_default_bucket" {
      type        = string
      description = "Default ORCA S3 Glacier bucket to use."
    }
  • Update the terraform.tfvars file with the value for orca_default_bucket.
    orca_default_bucket = "my-orca-bucket"
  • Update the orca.tf file to include all of the updated and new variables as seen below. Note the change to source and the commented out optional variables.
    ## ORCA Module
    ## =============================================================================
    module "orca" {
      source = "https://github.com/nasa/cumulus-orca/releases/download/v3.0.0/cumulus-orca-terraform.zip//modules"
      ## --------------------------
      ## Cumulus Variables
      ## --------------------------
      ## REQUIRED
      aws_profile              = var.aws_profile
      buckets                  = var.buckets
      lambda_subnet_ids        = var.lambda_subnet_ids
      permissions_boundary_arn = var.permissions_boundary_arn
      prefix                   = var.prefix
      system_bucket            = var.system_bucket
      vpc_id                   = var.vpc_id
      workflow_config          = module.cumulus.workflow_config
    
      ## OPTIONAL
      region = var.region
      tags   = var.tags
    
      ## --------------------------
      ## ORCA Variables
      ## --------------------------
      ## REQUIRED
      database_app_user_pw = var.database_app_user_pw
      orca_default_bucket  = var.orca_default_bucket
      postgres_user_pw     = var.database_app_user_pw
    
      ## OPTIONAL
      # database_port                        = 5432
      # orca_ingest_lambda_memory_size       = 2240
      # orca_ingest_lambda_timeout           = 600
      # orca_recovery_buckets                = []
      # orca_recovery_complete_filter_prefix = ""
      # orca_recovery_expiration_days        = 5
      # orca_recovery_lambda_memory_size     = 128
      # orca_recovery_lambda_timeout         = 300
      # orca_recovery_retry_limit            = 3
      # orca_recovery_retry_interval         = 1
    }

Added

  • ORCA-149 Added a new workflow, OrcaCopyToGlacierWorkflow, for ingest on-demand.
  • ORCA-175 Added copy_to_glacier_cumulus_translator for transforming CumulusDashboard input to the proper format.
  • ORCA-181 Added orca_catalog_reporting_dummy lambda for integration testing.
  • ORCA-165 Added new lambda function post_copy_request_to_queue.py under *tasks/post_copy_request_to_queue/ for querying the DB and posting to two queues. Added unit tests test_post_copy_request_to_queue.py under tasks/post_copy_request_to_queue/test/unit_tests/ to test the new lambda. Added new scripts run_tests.sh and build.sh under /tasks/post_copy_request_to_queue/bin to run the unit tests.
  • ORCA-163 Added shared library shared_recovery.py under tasks/shared_libraries/recovery/ for posting to status SQS queue. This include post_status_for_job_to_queue() function that posts status of jobs to SQS queue, post_status_for_job_to_queue() function that posts status of files to SQS queue, and post_entry_to_queue() function that is used by the above two functions for sending the message to the queue. Added unit tests test_shared_recovery.py under tasks/shared_libraries/recovery/test/unit_tests/ to test shared library. Added new script run_tests.sh under tasks/shared_libraries/recovery/bin to run the unit tests.
  • ORCA-92 Added two lambdas (request_status_for_file and request_status_for_job) for use with the Cumulus dashboard. request_status_for_file will retrieve status for an individual file, with the optional parameter of which job you want the file's recovery status for. request_status_for_job will retrieve a summary of the job along with status totals. See the task's 'schemas' folder and the README.md files for more information and examples.
  • ORCA-157 Modified terraform module to add two SQS queues required by copy_files_to_archive lambda function. The first queue will be used by copy_files_to_archive lambda to get necessary information needed for copying next files. The second queue will be used by copy_files_to_archive lambda to write database status updates.
  • Deployment and development documentation has been created for ORCA.
  • ORCA-119 Added new script bin/create_release_documentation.sh to deploy the documentation when the RELEASE_FLAG is set to true in Bamboo pipeline.

Changed

  • Glacier buckets meant for the ORCA archive now should be a type of orca instead of glacier. { my-orca-bucket = { name = "orca-primary-bucket", type = "orca" } }.
  • The copy_to_glacier lambda now requires a ORCA_DEFAULT_BUCKET variable to be set.
  • Terraform variables have been renamed and updated to better match Cumulus and identify optional and required ORCA variables. The table below shows the changes and mappings to the new names.
    Old Variable Name New Variable Name Notes
    copy_retry_sleep_secs orca_recovery_retry_interval Updated to better reflect back off and retry logic
    database_app_user REMOVED This variable is actually a static value and has been removed.
    database_name REMOVED This variable is actually a static value and has been removed.
    ddl_dir REMOVED This variable is actually a static value and has been removed.
    default_tags tags Renamed to match the Cumulus tags variable.
    drop_database REMOVED This has been removed and should only be used for development work.
    lambda_timeout orca_ingest_lambda_timeout, orca_recovery_lambda_timeout Timeout variables have been broken out per usages.
    platform REMOVED This was used for development and debugging. The variable is no longer needed and has been removed.
    profile aws_profile Updated to better reflect the variable value comes from the Cumulus variable.
    restore_complete_filter_prefix orca_recovery_complete_filter_prefix Updated to show ORCA branding.
    subnet_ids lambda_subnet_ids Updated to better reflect the variable value comes from the Cumulus variable.
  • The following new variables have been added to the terraform deploy. More information is available in the deployment documentation.
    Variable Name Notes
    system_bucket REQUIRED: This variable manages where configuration files are managed in AWS for the deployment.
    orca_default_bucket REQUIRED: This variable has the user set the default ORCA glacier bucket backups should go to.
    orca_ingest_lambda_memory_size OPTIONAL: Allows a user to change the max memory allocation for the copy_to_glacier lambda.
    orca_ingest_lambda_timeout OPTIONAL: Allows a user to change the timeout for the copy_to_glacier lambda.
    orca_recovery_buckets OPTIONAL: Allows users to narrowly define which buckets ORCA can restore back to.
    orca_recovery_expiration_days OPTIONAL: Allows a user to change the number of days a recovered file remains in S3 before being put back in glacier.
    orca_recovery_lambda_memory_size OPTIONAL: Allows a user to change the max memory allocation for the copy_to_archive lambda.
    orca_recovery_lambda_timeout OPTIONAL: Allows a user to change the timeout for the copy_to_archive lambda.
    orca_recovery_retry_limit OPTIONAL: Allows a user to change the recovery workflow and lambdas retry limit.
    orca_recovery_retry_interval OPTIONAL: Allows a user to change the recovery workflow and lambdas interval to sleep between retries.
  • Task and module build scripts have been updated to better display error information and documented to the actual steps being performed.
  • Documentation has been updated to better provide end users with information on ORCA.
  • ORCA-109 request_files now uses SQS queue for recovery status updates, and receives input from a separate SQS queue.
  • ORCA-91 copy_files_to_archive now uses SQS queue for recovery status updates. Will generate a job_id if none is given, and return it in the output.
  • request_files now uses the same default glacier bucket as copy_to_glacier.
  • ORCA-172 db_deploy lambda now will migrate the database or create a new orca database based off of the presence of certain objects in the database. This has led to the addition/removal of environment variables and updates to the task documentation (README.md) and ORCA website documentation for architecture and ORCA schema information. The lambda has been modified to add future migrations.

Deprecated

  • None

Removed

  • The request_status lambda under /tasks is removed since it is replaced by the requests_status_for_job and request_status_for_granule lambdas. The terraform modules, shell scripts and variables related to the lambda are also removed.

Fixed

  • Updated IAM policies to better include all buckets by type instead of looking at the bucket variable key name.

Security

  • None

[v2.0.1] 2021-2-5

Changed

  • ORCA-125 BucketOwnerFullControl ACL is now set on for storage PUT requests in the copy_to_glacier lambda. This prevents errors during cross account (OU) copying of data.

[v2.0.0] 2021-1-15

Migration Notes

  • ORCA-67 The expected input/output of the copy_to_glacier lambda has been changed. See how to adopt these changes in your Cumulus workflow here.
  • ORCA-61 We now support collection-level configuration to exclude specific file-types from your glacier archive (when using the copy_to_glacier lambda). See how to configure this for your collections here.

Added

  • ORCA-58 ORCA user facing documentation
    • Docusaurus documentation website framework initialized and created
    • Initial content migrated off of wiki and into markdown pages for end users to view ORCA documentation with no wiki access.
    • Updates to README with starting the documentation server.
  • ORCA-61 Support dynamic configuration of files to exclude from glacier archive
    • Configured in a collection.meta configuration
  • ORCA-67 Generalize input/output scheme of copy_to_glacier lambda so it can be used more easily in a Cumulus workflow.

Changed

  • ORCA-68 Update DB tests to use mocking instead of real Postgres DB.
  • ORCA-70 As a DAAC we would like to be able to deploy multiple instances in our sandbox account.
    • Moves secret storage from SSM parameter store to secrets manager and adds a prefix to the keys.
  • ORCA-74 Move integration tests into their own files.

[v1.0.0] 2020-12-4

Migration Notes

None - this is the baseline release.

Added

  • Misc
    • Unit test upgrades - mocking unnecessary dependencies.
    • Code formatting and styling
  • ORCA-65 Copy Lambda
    • We're including a copy lambda in the v1.0.0 release. The use of this lambda function is optional and explained in the task readme/documentation.
  • ORCA-33 Automated Building/Testing Updates
    • Created some bash scripts for use in the Bamboo build.
    • Updated requirements-dev.txt files for each task and moved the testing framework from nosetest (no longer supported) to coverage and pytest.
    • Support in GitHub for automated build/test/release via Bamboo
    • Use coverage and pytest for coverage/testing