Skip to content

Commit

Permalink
feat(ingestion): Implement DB migrations & Ingestion IAC (#4)
Browse files Browse the repository at this point in the history
This PR introduces several key enhancements to our project:

1. Database Migration using Alembic: Implements a new database migration
to modify the schema and ensure compatibility with new data ingestion
requirements.
- Added new Alembic migration scripts located in the
`migrations/db/versions` directory for findings, scans and jobs table.
- The migration script alters the database schema to include new tables
and columns required for the ingestion process.
2. Infrastructure as Code (IAC) for Data Ingestion: Sets up the
infrastructure to handle data ingestion using AWS services.
- Added Terraform configuration files in the
`infrastructure/ingestion/aws` directory.
- The configuration provisions necessary AWS resources, step function,
Lambda functions, and IAM roles.
3. Lambda Functions for Data Ingestion: Includes Lambda functions to
ingest data from S3 into the database tables.
- Implemented Lambda functions in the
`infrastructure/ingestion/aws/lambda` directory.
- They process incoming data files and insert the data into the
appropriate database tables.
5. Ingestions: Implement findings data ingestion.
  • Loading branch information
hibare authored Jun 27, 2024
1 parent 63fe0b5 commit 5488ae2
Show file tree
Hide file tree
Showing 42 changed files with 1,751 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -254,3 +254,9 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
.idea/

# Databases
*.db

# Docker
volumes/*
9 changes: 9 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,12 @@ repos:
entry: bash -c 'docker run --rm -v "$(pwd):/workdir" -i --rm trufflesecurity/trufflehog:latest git file:///workdir --since-commit HEAD --only-verified --fail'
language: system
stages: ["commit", "push"]
- repo: https://github.com/hadolint/hadolint
rev: v2.10.0
hooks:
- id: hadolint-docker
name: Lint Dockerfiles
description: Runs hadolint Docker image to lint Dockerfiles
language: docker_image
types: ["dockerfile"]
entry: ghcr.io/hadolint/hadolint hadolint
48 changes: 48 additions & 0 deletions docker-compose.dev.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
services:
init:
image: python:3.9
container_name: init
volumes:
- ./migrations:/migrations:ro
environment:
- DB_URL=postgresql://myuser:mypassword@postgres/mydatabase
command:
- sh
- -c
- |
cd /migrations
pip install poetry
poetry lock --no-update
poetry install
poetry run python migrate.py
depends_on:
postgres:
condition: service_healthy

postgres:
image: postgres:latest
container_name: postgres
environment:
POSTGRES_USER: myuser
POSTGRES_PASSWORD: mypassword
POSTGRES_DB: mydatabase
ports:
- "127.0.0.1:5432:5432"
volumes:
- ./volumes/postgres:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U myuser -d mydatabase"]
interval: 10s
timeout: 5s
retries: 3

adminer:
image: adminer:latest
container_name: adminer
ports:
- "127.0.0.1:8080:8080"
depends_on:
init:
condition: service_completed_successfully
postgres:
condition: service_started
86 changes: 86 additions & 0 deletions infrastructure/ingestion/aws/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# infrastructure

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >=1.3 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | ~> 5.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | ~> 5.0 |
| <a name="provider_local"></a> [local](#provider\_local) | n/a |
| <a name="provider_null"></a> [null](#provider\_null) | n/a |
| <a name="provider_random"></a> [random](#provider\_random) | n/a |

## Modules

No modules.

## Resources

| Name | Type |
|------|------|
| [aws_cloudwatch_event_rule.ingestion_sfn_trigger_rule](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule) | resource |
| [aws_cloudwatch_event_target.ingestion_sfn_trigger](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource |
| [aws_db_instance.rds_postgres](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/db_instance) | resource |
| [aws_iam_policy.policy_for_execution_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_policy) | resource |
| [aws_iam_role.cloudwatch_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role.lambda_execution_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role.sfn_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role) | resource |
| [aws_iam_role_policy.cloudwatch_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
| [aws_iam_role_policy.sfn_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource |
| [aws_iam_role_policy_attachment.LambdaExecutionRolePolicyAttachment](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy_attachment) | resource |
| [aws_lambda_function.ingestion-lambda](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource |
| [aws_lambda_function.migration-lambda](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function) | resource |
| [aws_secretsmanager_secret.rds_master_password](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/secretsmanager_secret) | resource |
| [aws_secretsmanager_secret_version.rds_master_password](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/secretsmanager_secret_version) | resource |
| [aws_security_group.lambda_sg](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource |
| [aws_security_group.rds_sg](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/security_group) | resource |
| [aws_sfn_state_machine.ingestion-step-function](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sfn_state_machine) | resource |
| [null_resource.ingestion_lambda_build](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [null_resource.migration_lambda_build](https://registry.terraform.io/providers/hashicorp/null/latest/docs/resources/resource) | resource |
| [random_password.rds_master_password](https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/password) | resource |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_iam_policy_document.cloudwatch_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.cloudwatch_policy_document](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.lambda_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.permissions_for_execution_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.sf_assume_role](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.sfn_policy_document](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_security_group.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/security_group) | data source |
| [aws_subnet.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnet) | data source |
| [aws_subnets.default](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnets) | data source |
| [aws_vpc.selected](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc) | data source |
| [local_file.ingestion_lambda_build](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source |
| [local_file.migration_lambda_build](https://registry.terraform.io/providers/hashicorp/local/latest/docs/data-sources/file) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_aws_profile"></a> [aws\_profile](#input\_aws\_profile) | AWS profile to use for authentication | `string` | n/a | yes |
| <a name="input_aws_region"></a> [aws\_region](#input\_aws\_region) | AWS region where to deploy resources | `string` | n/a | yes |
| <a name="input_db_subnet_group_name"></a> [db\_subnet\_group\_name](#input\_db\_subnet\_group\_name) | Name of the RDS subnet group | `string` | n/a | yes |
| <a name="input_disable_ingestion_schedule"></a> [disable\_ingestion\_schedule](#input\_disable\_ingestion\_schedule) | Disable the ingestion schedule | `bool` | `false` | no |
| <a name="input_environment_type"></a> [environment\_type](#input\_environment\_type) | Environment type | `string` | n/a | yes |
| <a name="input_ingestion_schedule"></a> [ingestion\_schedule](#input\_ingestion\_schedule) | Cron schedule for the CloudWatch Event Rule | `string` | `"rate(24 hours)"` | no |
| <a name="input_permissions_boundary_arn"></a> [permissions\_boundary\_arn](#input\_permissions\_boundary\_arn) | ARN of the permissions boundary to use for the IAM role | `string` | n/a | yes |
| <a name="input_project_name"></a> [project\_name](#input\_project\_name) | Name of the project | `string` | `"secrets-finder"` | no |
| <a name="input_rds_db_name"></a> [rds\_db\_name](#input\_rds\_db\_name) | Name of the database to create in the RDS instance | `string` | `"secrets_finder"` | no |
| <a name="input_rds_username"></a> [rds\_username](#input\_rds\_username) | Username for the RDS instance | `string` | `"secrets_finder"` | no |
| <a name="input_s3_bucket_name"></a> [s3\_bucket\_name](#input\_s3\_bucket\_name) | Name of the S3 bucket to create | `string` | n/a | yes |
| <a name="input_subnet_name"></a> [subnet\_name](#input\_subnet\_name) | Name of the subnet where to deploy the resources (wildcards are allowed: first match is used) | `string` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | A map of tags to add to the resources | `map(string)` | n/a | yes |
| <a name="input_vpc_name"></a> [vpc\_name](#input\_vpc\_name) | Identifier of the VPC to use for secrets-finder | `string` | n/a | yes |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_rds_pg_endpoint"></a> [rds\_pg\_endpoint](#output\_rds\_pg\_endpoint) | n/a |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
17 changes: 17 additions & 0 deletions infrastructure/ingestion/aws/cloudwatch.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
resource "aws_cloudwatch_event_rule" "ingestion_sfn_trigger_rule" {
name = "${var.project_name}-ingestion-sfn-trigger"
description = "Triggers the Step function on schedule"
schedule_expression = var.ingestion_schedule
state = var.disable_ingestion_schedule ? "DISABLED" : "ENABLED"
}

resource "aws_cloudwatch_event_target" "ingestion_sfn_trigger" {
rule = aws_cloudwatch_event_rule.ingestion_sfn_trigger_rule.name
arn = aws_sfn_state_machine.ingestion-step-function.arn
role_arn = aws_iam_role.cloudwatch_role.arn

depends_on = [
aws_iam_role.cloudwatch_role,
aws_iam_role_policy.cloudwatch_policy,
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"Comment": "Ingestion State Machine",
"StartAt": "BootStrapState",
"States": {
"BootStrapState": {
"Type": "Task",
"Resource": "${migrate_lambda_arn}",
"Next": "IngestionState"
},
"IngestionState": {
"Type": "Parallel",
"Branches": [
{
"Comment": "Ingest Scheduled Scan Findings",
"StartAt": "ListScheduledScanFindingsFiles",
"States": {
"ListScheduledScanFindingsFiles": {
"Type": "Task",
"Resource": "${ingestion_lambda_arn}",
"ResultPath": "$.lambdaResult",
"Parameters": {
"action": "list_files",
"prefix": "secrets-finder/scheduled-scans/results/"
},
"Next": "IngestScheduledScanFindingsFiles"
},
"IngestScheduledScanFindingsFiles": {
"Type": "Map",
"ItemsPath": "$.lambdaResult.body.files",
"Parameters": {
"index.$": "$$.Map.Item.Index",
"key.$": "$$.Map.Item.Value"
},
"Iterator": {
"StartAt": "IngestScheduledScanFindings",
"States": {
"IngestScheduledScanFindings": {
"Type": "Task",
"Resource": "${ingestion_lambda_arn}",
"Parameters": {
"action": "ingest_findings",
"file_key.$": "$.key"
},
"End": true
}
}
},
"End": true
}
}
},
{
"Comment": "Ingest Ongoing Scan Findings",
"StartAt": "ListOngoingScanFindingsFiles",
"States": {
"ListOngoingScanFindingsFiles": {
"Type": "Task",
"Resource": "${ingestion_lambda_arn}",
"ResultPath": "$.lambdaResult",
"Parameters": {
"action": "list_files",
"prefix": "secrets-finder/ongoing-scans/results/"
},
"Next": "IngestOngoingScanFindingsFiles"
},
"IngestOngoingScanFindingsFiles": {
"Type": "Map",
"ItemsPath": "$.lambdaResult.body.files",
"Parameters": {
"index.$": "$$.Map.Item.Index",
"key.$": "$$.Map.Item.Value"
},
"Iterator": {
"StartAt": "IngestOngoingScanFindings",
"States": {
"IngestOngoingScanFindings": {
"Type": "Task",
"Resource": "${ingestion_lambda_arn}",
"Parameters": {
"action": "ingest_findings",
"file_key.$": "$.key"
},
"End": true
}
}
},
"End": true
}
}
}
],
"End": true
}
}
}
Loading

0 comments on commit 5488ae2

Please sign in to comment.