-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat(ingestion): Implement DB migrations & Ingestion IAC (#4)
This PR introduces several key enhancements to our project: 1. Database Migration using Alembic: Implements a new database migration to modify the schema and ensure compatibility with new data ingestion requirements. - Added new Alembic migration scripts located in the `migrations/db/versions` directory for findings, scans and jobs table. - The migration script alters the database schema to include new tables and columns required for the ingestion process. 2. Infrastructure as Code (IAC) for Data Ingestion: Sets up the infrastructure to handle data ingestion using AWS services. - Added Terraform configuration files in the `infrastructure/ingestion/aws` directory. - The configuration provisions necessary AWS resources, step function, Lambda functions, and IAM roles. 3. Lambda Functions for Data Ingestion: Includes Lambda functions to ingest data from S3 into the database tables. - Implemented Lambda functions in the `infrastructure/ingestion/aws/lambda` directory. - They process incoming data files and insert the data into the appropriate database tables. 5. Ingestions: Implement findings data ingestion.
- Loading branch information
Showing
42 changed files
with
1,751 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
services: | ||
init: | ||
image: python:3.9 | ||
container_name: init | ||
volumes: | ||
- ./migrations:/migrations:ro | ||
environment: | ||
- DB_URL=postgresql://myuser:mypassword@postgres/mydatabase | ||
command: | ||
- sh | ||
- -c | ||
- | | ||
cd /migrations | ||
pip install poetry | ||
poetry lock --no-update | ||
poetry install | ||
poetry run python migrate.py | ||
depends_on: | ||
postgres: | ||
condition: service_healthy | ||
|
||
postgres: | ||
image: postgres:latest | ||
container_name: postgres | ||
environment: | ||
POSTGRES_USER: myuser | ||
POSTGRES_PASSWORD: mypassword | ||
POSTGRES_DB: mydatabase | ||
ports: | ||
- "127.0.0.1:5432:5432" | ||
volumes: | ||
- ./volumes/postgres:/var/lib/postgresql/data | ||
healthcheck: | ||
test: ["CMD-SHELL", "pg_isready -U myuser -d mydatabase"] | ||
interval: 10s | ||
timeout: 5s | ||
retries: 3 | ||
|
||
adminer: | ||
image: adminer:latest | ||
container_name: adminer | ||
ports: | ||
- "127.0.0.1:8080:8080" | ||
depends_on: | ||
init: | ||
condition: service_completed_successfully | ||
postgres: | ||
condition: service_started |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# infrastructure | ||
|
||
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK --> | ||
## Requirements | ||
|
||
| Name | Version | | ||
|------|---------| | ||
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >=1.3 | | ||
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | ~> 5.0 | | ||
|
||
## Providers | ||
|
||
| Name | Version | | ||
|------|---------| | ||
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 5.0 | | ||
| <a name="provider_local"></a> [local](#provider\_local) | n/a | | ||
| <a name="provider_null"></a> [null](#provider\_null) | n/a | | ||
| <a name="provider_random"></a> [random](#provider\_random) | n/a | | ||
|
||
## Modules | ||
|
||
No modules. | ||
|
||
## Resources | ||
|
||
| Name | Type | | ||
|------|------| | ||
| [aws_cloudwatch_event_rule.ingestion_sfn_trigger_rule](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule) | resource | | ||
| [aws_cloudwatch_event_target.ingestion_sfn_trigger](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource | | ||
| [aws_db_instance.rds_postgres](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/db_instance) | resource | | ||
| [aws_iam_policy.policy_for_execution_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource | | ||
| [aws_iam_role.cloudwatch_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | | ||
| [aws_iam_role.lambda_execution_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | | ||
| [aws_iam_role.sfn_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource | | ||
| [aws_iam_role_policy.cloudwatch_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource | | ||
| [aws_iam_role_policy.sfn_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource | | ||
| [aws_iam_role_policy_attachment.LambdaExecutionRolePolicyAttachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource | | ||
| [aws_lambda_function.ingestion-lambda](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource | | ||
| [aws_lambda_function.migration-lambda](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource | | ||
| [aws_secretsmanager_secret.rds_master_password](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/secretsmanager_secret) | resource | | ||
| [aws_secretsmanager_secret_version.rds_master_password](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/secretsmanager_secret_version) | resource | | ||
| [aws_security_group.lambda_sg](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource | | ||
| [aws_security_group.rds_sg](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource | | ||
| [aws_sfn_state_machine.ingestion-step-function](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sfn_state_machine) | resource | | ||
| [null_resource.ingestion_lambda_build](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource | | ||
| [null_resource.migration_lambda_build](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource | | ||
| [random_password.rds_master_password](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/password) | resource | | ||
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source | | ||
| [aws_iam_policy_document.cloudwatch_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | ||
| [aws_iam_policy_document.cloudwatch_policy_document](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | ||
| [aws_iam_policy_document.lambda_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | ||
| [aws_iam_policy_document.permissions_for_execution_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | ||
| [aws_iam_policy_document.sf_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | ||
| [aws_iam_policy_document.sfn_policy_document](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source | | ||
| [aws_security_group.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/security_group) | data source | | ||
| [aws_subnet.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnet) | data source | | ||
| [aws_subnets.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnets) | data source | | ||
| [aws_vpc.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc) | data source | | ||
| [local_file.ingestion_lambda_build](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source | | ||
| [local_file.migration_lambda_build](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source | | ||
|
||
## Inputs | ||
|
||
| Name | Description | Type | Default | Required | | ||
|------|-------------|------|---------|:--------:| | ||
| <a name="input_aws_profile"></a> [aws\_profile](#input\_aws\_profile) | AWS profile to use for authentication | `string` | n/a | yes | | ||
| <a name="input_aws_region"></a> [aws\_region](#input\_aws\_region) | AWS region where to deploy resources | `string` | n/a | yes | | ||
| <a name="input_db_subnet_group_name"></a> [db\_subnet\_group\_name](#input\_db\_subnet\_group\_name) | Name of the RDS subnet group | `string` | n/a | yes | | ||
| <a name="input_disable_ingestion_schedule"></a> [disable\_ingestion\_schedule](#input\_disable\_ingestion\_schedule) | Disable the ingestion schedule | `bool` | `false` | no | | ||
| <a name="input_environment_type"></a> [environment\_type](#input\_environment\_type) | Environment type | `string` | n/a | yes | | ||
| <a name="input_ingestion_schedule"></a> [ingestion\_schedule](#input\_ingestion\_schedule) | Cron schedule for the CloudWatch Event Rule | `string` | `"rate(24 hours)"` | no | | ||
| <a name="input_permissions_boundary_arn"></a> [permissions\_boundary\_arn](#input\_permissions\_boundary\_arn) | ARN of the permissions boundary to use for the IAM role | `string` | n/a | yes | | ||
| <a name="input_project_name"></a> [project\_name](#input\_project\_name) | Name of the project | `string` | `"secrets-finder"` | no | | ||
| <a name="input_rds_db_name"></a> [rds\_db\_name](#input\_rds\_db\_name) | Name of the database to create in the RDS instance | `string` | `"secrets_finder"` | no | | ||
| <a name="input_rds_username"></a> [rds\_username](#input\_rds\_username) | Username for the RDS instance | `string` | `"secrets_finder"` | no | | ||
| <a name="input_s3_bucket_name"></a> [s3\_bucket\_name](#input\_s3\_bucket\_name) | Name of the S3 bucket to create | `string` | n/a | yes | | ||
| <a name="input_subnet_name"></a> [subnet\_name](#input\_subnet\_name) | Name of the subnet where to deploy the resources (wildcards are allowed: first match is used) | `string` | n/a | yes | | ||
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to add to the resources | `map(string)` | n/a | yes | | ||
| <a name="input_vpc_name"></a> [vpc\_name](#input\_vpc\_name) | Identifier of the VPC to use for secrets-finder | `string` | n/a | yes | | ||
|
||
## Outputs | ||
|
||
| Name | Description | | ||
|------|-------------| | ||
| <a name="output_rds_pg_endpoint"></a> [rds\_pg\_endpoint](#output\_rds\_pg\_endpoint) | n/a | | ||
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK --> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
resource "aws_cloudwatch_event_rule" "ingestion_sfn_trigger_rule" { | ||
name = "${var.project_name}-ingestion-sfn-trigger" | ||
description = "Triggers the Step function on schedule" | ||
schedule_expression = var.ingestion_schedule | ||
state = var.disable_ingestion_schedule ? "DISABLED" : "ENABLED" | ||
} | ||
|
||
resource "aws_cloudwatch_event_target" "ingestion_sfn_trigger" { | ||
rule = aws_cloudwatch_event_rule.ingestion_sfn_trigger_rule.name | ||
arn = aws_sfn_state_machine.ingestion-step-function.arn | ||
role_arn = aws_iam_role.cloudwatch_role.arn | ||
|
||
depends_on = [ | ||
aws_iam_role.cloudwatch_role, | ||
aws_iam_role_policy.cloudwatch_policy, | ||
] | ||
} |
95 changes: 95 additions & 0 deletions
95
infrastructure/ingestion/aws/configuration/ingestion_sfn_definition.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
{ | ||
"Comment": "Ingestion State Machine", | ||
"StartAt": "BootStrapState", | ||
"States": { | ||
"BootStrapState": { | ||
"Type": "Task", | ||
"Resource": "${migrate_lambda_arn}", | ||
"Next": "IngestionState" | ||
}, | ||
"IngestionState": { | ||
"Type": "Parallel", | ||
"Branches": [ | ||
{ | ||
"Comment": "Ingest Scheduled Scan Findings", | ||
"StartAt": "ListScheduledScanFindingsFiles", | ||
"States": { | ||
"ListScheduledScanFindingsFiles": { | ||
"Type": "Task", | ||
"Resource": "${ingestion_lambda_arn}", | ||
"ResultPath": "$.lambdaResult", | ||
"Parameters": { | ||
"action": "list_files", | ||
"prefix": "secrets-finder/scheduled-scans/results/" | ||
}, | ||
"Next": "IngestScheduledScanFindingsFiles" | ||
}, | ||
"IngestScheduledScanFindingsFiles": { | ||
"Type": "Map", | ||
"ItemsPath": "$.lambdaResult.body.files", | ||
"Parameters": { | ||
"index.$": "$$.Map.Item.Index", | ||
"key.$": "$$.Map.Item.Value" | ||
}, | ||
"Iterator": { | ||
"StartAt": "IngestScheduledScanFindings", | ||
"States": { | ||
"IngestScheduledScanFindings": { | ||
"Type": "Task", | ||
"Resource": "${ingestion_lambda_arn}", | ||
"Parameters": { | ||
"action": "ingest_findings", | ||
"file_key.$": "$.key" | ||
}, | ||
"End": true | ||
} | ||
} | ||
}, | ||
"End": true | ||
} | ||
} | ||
}, | ||
{ | ||
"Comment": "Ingest Ongoing Scan Findings", | ||
"StartAt": "ListOngoingScanFindingsFiles", | ||
"States": { | ||
"ListOngoingScanFindingsFiles": { | ||
"Type": "Task", | ||
"Resource": "${ingestion_lambda_arn}", | ||
"ResultPath": "$.lambdaResult", | ||
"Parameters": { | ||
"action": "list_files", | ||
"prefix": "secrets-finder/ongoing-scans/results/" | ||
}, | ||
"Next": "IngestOngoingScanFindingsFiles" | ||
}, | ||
"IngestOngoingScanFindingsFiles": { | ||
"Type": "Map", | ||
"ItemsPath": "$.lambdaResult.body.files", | ||
"Parameters": { | ||
"index.$": "$$.Map.Item.Index", | ||
"key.$": "$$.Map.Item.Value" | ||
}, | ||
"Iterator": { | ||
"StartAt": "IngestOngoingScanFindings", | ||
"States": { | ||
"IngestOngoingScanFindings": { | ||
"Type": "Task", | ||
"Resource": "${ingestion_lambda_arn}", | ||
"Parameters": { | ||
"action": "ingest_findings", | ||
"file_key.$": "$.key" | ||
}, | ||
"End": true | ||
} | ||
} | ||
}, | ||
"End": true | ||
} | ||
} | ||
} | ||
], | ||
"End": true | ||
} | ||
} | ||
} |
Oops, something went wrong.