de-totes-project

Northcoders Data Engineer Nov 2023 cohort project

Team Name: TotesOps

Project specification can be found here: https://github.com/northcoders/de-project-specification

Set-up

Setting up variable environment for the project

Before running the project, you will need to run the following command to set up your variable environment and install any required dependencies:

make requirements

Creating bucket to store Terraform tf state file

Run the following command to create an s3 bucket to store the terraform state file (you will be prompted to name the bucket):

make run-make-bucket

Setting up AWS SNS topic

You should then run the following command from the root of the project to create an AWS SNS topic using your email address. This is where alerts and alarms will be sent:

./deployment/email_subscriber.sh myemail@email.com

Storing database credentials in AWS Secrets Manager

In order to create a secret containing login credentials on AWS SecretsManager, you will need a db_credentials.json file in the following format:

{ 
    "database" : "databasename",
    "user" : "username",
    "password" : "password",
    "host" : "awshostname",
    "port" : "0000"
}

The contents of this JSON file is created through the command line aws secret manager and accessed in Lambda handlers to access the totesys database.

You will then need to repeat this process to store the credentials for the data warehouse in a separate secret.

Deployment

All required AWS infrastructure is deployed via Terraform (except for the aforementioned tf state bucket and SNS topic).

The deployment is automated via a CI/CD pipeline carried out with GitHub Actions.

Lambda 1 (extract_handler1)

Description

This lambda handler operates on a 2 minute schedule and checks all table in the totesys database for new data at each invocation. If new data is found, it writes this data to a csv file and saves it in a designated S3 bucket (organised in sub-folders for each table).

Util functions

This lambda handler utilises the following util functions:

get_table_names
get_bucket_name
is_bucket_empty
L1_extract_data
get_most_recent_file
get_timestamp
format_data
write_csv

Lambda 2 (transform_handler2)

Description

This lambda handler is triggered by an update to the ingestion bucket. The lambda handler reads the most recent file in the ingestion bucket and converts the file from csv to a dataframe. The lambda handler then transforms the data to the desired format and writes the transformed dataframe to a parquet file in each invocation and saves this to an s3 bucket (organised in sub-folders for each table).

Util functions

This lambda handler utilises the following util functions:

get_file_and_ingestion_bucket_name
get_bucket_name_2
get_most_recent_file_2
make_dim_counterparty
make_dim_currency
make_dim_date
make_dim_design
make_dim_location
make_dim_staff
make_fact_sales_order
read_csv_to_df
write_to_parquet

Lambda 3 (load_handler3)

Description

This lambda handler is triggered by an update to the processed bucket. The lambda handler reads the most recent parquet file in the processed bucket, converts the data into a dataframe and inserts it into the correct table in the data warehouse.

Util functions

This lamba handler utilises the following util functions:

get_file_and_bucket
get_table_name
read_parquet
upload_data

Name		Name	Last commit message	Last commit date
Latest commit History 425 Commits
.github/workflows		.github/workflows
Terraform		Terraform
aws_utils		aws_utils
deployment		deployment
src		src
test		test
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
lambda1.zip		lambda1.zip
lambda2.zip		lambda2.zip
lambda3.zip		lambda3.zip
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

de-totes-project

Set-up

Setting up variable environment for the project

Creating bucket to store Terraform tf state file

Setting up AWS SNS topic

Storing database credentials in AWS Secrets Manager

Deployment

Lambda 1 (extract_handler1)

Description

Util functions

Lambda 2 (transform_handler2)

Description

Util functions

Lambda 3 (load_handler3)

Description

Util functions

About

Releases

Packages

Contributors 6

Languages

LeahTew/de-totes-project

Folders and files

Latest commit

History

Repository files navigation

de-totes-project

Set-up

Setting up variable environment for the project

Creating bucket to store Terraform tf state file

Setting up AWS SNS topic

Storing database credentials in AWS Secrets Manager

Deployment

Lambda 1 (extract_handler1)

Description

Util functions

Lambda 2 (transform_handler2)

Description

Util functions

Lambda 3 (load_handler3)

Description

Util functions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages