Use EMR Serverless with Boto3
generate-parquet.ipynb
- Download source data and transform into
parquet
format
- Download source data and transform into
credentials_example.cfg
- Credentials required for running
EMR Serverless
.
- Credentials required for running
emr-serverless-IaC-functional.ipynb
- Set up an
Application
inEMR Studio
. - Generate required
role
,policy
and attach it to the role. - Submit
job
to theApplication
and track the status
- Set up an
read_outputs.ipynb
- read outputs in
S3
withawswrangler
- visualize data with
matplotlib
- read outputs in
- Generate a programmatic access user with policy as below:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "emr-serverless:*", "iam:GetAccountAuthorizationDetails" ], "Resource": "*" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:DeleteObject" ], "Resource": [ "arn:aws:s3:::tpe-mrt-data", "arn:aws:s3:::tpe-mrt-data/*" ] } ] }
- update
access_key
,secret_access_key
anduser_account_id
incredentials_example.cfg
- rename
credentials_example.cfg
tocredentials.cfg
- run
emr-serverless-IaC-functional.ipynb
- read output with
read_outputs.ipynb