Scatter gather with AWS lambda

The challenge

Implement batch processing on AWS:

scatter: Split up a single file of records to be processed (the file has been uploaded via s3)
process: Process records as parallel as possible
gather: Detect completion of processing and aggregate a result summary report in s3

Prerequisites

python 3.8
GNU make
docker
awscli
tfvm or terraform
cw >= v3.3.0

Usage locally

Start localstack, deploy and run benchmark

make clean start_localstack deploy benchmark report

Stop and cleanup

make stop_localstack clean

Usage on aws

All resources will be prefixed with your current ${USER}-. Pass SCOPE=mycustomprefix- to make to override this default.

Build, package, deploy run benchmark, report on measurements

make ENV=aws clean deploy_resources deploy_service benchmark report

Undeploy

make ENV=aws destroy

Variants

The task has been implemented in various variants:

s3-sqs-lambda-sync (with boto3 blocking io)
s3-sqs-lambda-async (with aioboto3 async io)
s3-sqs-lambda-async-chunked (with aioboto3 async io, records packed into chunks)
s3-sqs-lambda-dynamodb (with aioboto3 async io, records stored in dynamodb)
s3-notification-sqs-lambda (with aioboto3 async io, records stored in s3 in chunks, functions invoked by s3 notifications through sqs queues)

More/ alternative variants

sfn?
glue?
emr (spark)?
s3 athena?
s3 batch
single fat vm

Results

(Data)

Repository structure

infra - Resources and service infrastructure
src - Service sources
tests - Service tests
benchmark - Benchmark sources

Documentation

License

Apache Public License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
benchmark/aws_scatter_gather/benchmark		benchmark/aws_scatter_gather/benchmark
doc		doc
infra		infra
src/aws_scatter_gather		src/aws_scatter_gather
tests		tests
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.gitignore		.gitignore
.pylintrc		.pylintrc
.pytest.ini		.pytest.ini
.terraform-version		.terraform-version
Makefile		Makefile
docker-compose.yml		docker-compose.yml
license.txt		license.txt
measurements.csv		measurements.csv
readme.md		readme.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scatter gather with AWS lambda

The challenge

Prerequisites

Usage locally

Start localstack, deploy and run benchmark

Stop and cleanup

Usage on aws

Build, package, deploy run benchmark, report on measurements

Undeploy

Variants

More/ alternative variants

Results

Repository structure

Documentation

License

About

Languages

License

cbuschka/aws-scatter-gather

Folders and files

Latest commit

History

Repository files navigation

Scatter gather with AWS lambda

The challenge

Prerequisites

Usage locally

Start localstack, deploy and run benchmark

Stop and cleanup

Usage on aws

Build, package, deploy run benchmark, report on measurements

Undeploy

Variants

More/ alternative variants

Results

Repository structure

Documentation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages