Skip to content

Commit

Permalink
make construct more flexible
Browse files Browse the repository at this point in the history
  • Loading branch information
scott.hsieh[謝書正] committed Jun 8, 2022
1 parent d432958 commit f83c4a6
Show file tree
Hide file tree
Showing 6 changed files with 181 additions and 12 deletions.
92 changes: 92 additions & 0 deletions API.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,29 @@

### EmrClusterTemplateStack <a name="EmrClusterTemplateStack" id="cdk-emrserverless-with-delta-lake.EmrClusterTemplateStack"></a>

Creates a CloudFormation template which will be a Product under a Portfolio of AWS Service Catalog.

This is for creating an EMR cluster via cluster template in the EMR Studio, created by the `EmrServerless` construct, on the AWS Console.

And you don't have control via the `EmrServerless` construct by now. The documentation is for you to grasp the architecture of the `EmrServerless` more easily.

For detail, please refer to [Create AWS CloudFormation templates for Amazon EMR Studio](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-cluster-templates.html).

```ts
const product = new servicecatalog.CloudFormationProduct(this, 'MyFirstProduct', {
productName: 'EMR_6.6.0',
owner: 'scott.hsieh',
description: 'EMR cluster with 6.6.0 version',
productVersions: [
{
productVersionName: 'v1',
validateTemplate: true,
cloudFormationTemplate: servicecatalog.CloudFormationTemplate.fromProductStack(new EmrClusterTemplateStack(this, 'EmrStudio')),
},
],
});
```

#### Initializers <a name="Initializers" id="cdk-emrserverless-with-delta-lake.EmrClusterTemplateStack.Initializer"></a>

```typescript
Expand Down Expand Up @@ -2198,6 +2221,7 @@ const emrServerlessProps: EmrServerlessProps = { ... }
| **Name** | **Type** | **Description** |
| --- | --- | --- |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrServerlessProps.property.subnetIds">subnetIds</a></code> | <code>string[]</code> | The subnet IDs for the EMR studio. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrServerlessProps.property.serviceCatalogProps">serviceCatalogProps</a></code> | <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps">EmrStudioDeveloperStackProps</a></code> | Options for which kind of identity will be associated with the Product of the Porfolio in AWS Service Catalog for EMR cluster templates. |

---

Expand All @@ -2215,6 +2239,20 @@ You can select the subnets from the default VPC in your AWS account.

---

##### `serviceCatalogProps`<sup>Optional</sup> <a name="serviceCatalogProps" id="cdk-emrserverless-with-delta-lake.EmrServerlessProps.property.serviceCatalogProps"></a>

```typescript
public readonly serviceCatalogProps: EmrStudioDeveloperStackProps;
```

- *Type:* <a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps">EmrStudioDeveloperStackProps</a>

Options for which kind of identity will be associated with the Product of the Porfolio in AWS Service Catalog for EMR cluster templates.

You can choose either an IAM group, IAM role, or IAM user. If you leave it empty, an IAM user named `Administrator` with the `AdministratorAccess` power needs to be created first.

---

### EmrStudioDeveloperStackProps <a name="EmrStudioDeveloperStackProps" id="cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps"></a>

Interface for Service Catalog of EMR cluster templates.
Expand All @@ -2231,7 +2269,22 @@ const emrStudioDeveloperStackProps: EmrStudioDeveloperStackProps = { ... }

| **Name** | **Type** | **Description** |
| --- | --- | --- |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.group">group</a></code> | <code>aws-cdk-lib.aws_iam.IGroup</code> | an IAM group you wish to associate with the Portfolio for EMR cluster template. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.providerName">providerName</a></code> | <code>string</code> | The provider name in a Service Catalog for EMR cluster templates. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.role">role</a></code> | <code>aws-cdk-lib.aws_iam.IRole</code> | an IAM role you wish to associate with the Portfolio for EMR cluster template. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.user">user</a></code> | <code>aws-cdk-lib.aws_iam.IUser</code> | an IAM user you wish to associate with the Portfolio for EMR cluster template. |

---

##### `group`<sup>Optional</sup> <a name="group" id="cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.group"></a>

```typescript
public readonly group: IGroup;
```

- *Type:* aws-cdk-lib.aws_iam.IGroup

an IAM group you wish to associate with the Portfolio for EMR cluster template.

---

Expand All @@ -2248,6 +2301,30 @@ The provider name in a Service Catalog for EMR cluster templates.

---

##### `role`<sup>Optional</sup> <a name="role" id="cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.role"></a>

```typescript
public readonly role: IRole;
```

- *Type:* aws-cdk-lib.aws_iam.IRole

an IAM role you wish to associate with the Portfolio for EMR cluster template.

---

##### `user`<sup>Optional</sup> <a name="user" id="cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps.property.user"></a>

```typescript
public readonly user: IUser;
```

- *Type:* aws-cdk-lib.aws_iam.IUser

an IAM user you wish to associate with the Portfolio for EMR cluster template.

---

### EmrStudioEngineSecurityGroupProps <a name="EmrStudioEngineSecurityGroupProps" id="cdk-emrserverless-with-delta-lake.EmrStudioEngineSecurityGroupProps"></a>

Interface for engine security group of EMR Studio.
Expand Down Expand Up @@ -2302,6 +2379,7 @@ const emrStudioProps: EmrStudioProps = { ... }
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.authMode">authMode</a></code> | <code><a href="#cdk-emrserverless-with-delta-lake.StudioAuthMode">StudioAuthMode</a></code> | Specifies whether the Studio authenticates users using AWS SSO or IAM. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.description">description</a></code> | <code>string</code> | A detailed description of the Amazon EMR Studio. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.engineSecurityGroupId">engineSecurityGroupId</a></code> | <code>string</code> | The ID of the Amazon EMR Studio Engine security group. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.serviceCatalogProps">serviceCatalogProps</a></code> | <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps">EmrStudioDeveloperStackProps</a></code> | Options for which kind of identity will be associated with the Product of the Porfolio in AWS Service Catalog for EMR cluster templates. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.serviceRoleArn">serviceRoleArn</a></code> | <code>string</code> | *No description.* |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.serviceRoleName">serviceRoleName</a></code> | <code>string</code> | A name for the service role of an EMR Studio. |
| <code><a href="#cdk-emrserverless-with-delta-lake.EmrStudioProps.property.studioName">studioName</a></code> | <code>string</code> | A descriptive name for the Amazon EMR Studio. |
Expand Down Expand Up @@ -2378,6 +2456,20 @@ The Engine security group allows inbound network traffic from the Workspace secu

---

##### `serviceCatalogProps`<sup>Optional</sup> <a name="serviceCatalogProps" id="cdk-emrserverless-with-delta-lake.EmrStudioProps.property.serviceCatalogProps"></a>

```typescript
public readonly serviceCatalogProps: EmrStudioDeveloperStackProps;
```

- *Type:* <a href="#cdk-emrserverless-with-delta-lake.EmrStudioDeveloperStackProps">EmrStudioDeveloperStackProps</a>

Options for which kind of identity will be associated with the Product of the Porfolio in AWS Service Catalog for EMR cluster templates.

You can choose either an IAM group, IAM role, or IAM user. If you leave it empty, an IAM user named `Administrator` with the `AdministratorAccess` power needs to be created first.

---

##### `serviceRoleArn`<sup>Optional</sup> <a name="serviceRoleArn" id="cdk-emrserverless-with-delta-lake.EmrStudioProps.property.serviceRoleArn"></a>

```typescript
Expand Down
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# cdk-emrserverless-with-delta-lake

![high level architecture](./images/high%20level%20architecture.png)
This constrcut builds an EMR studio, a cluster template for the EMR Studio, and an EMR Serverless application. 2 S3 buckets will be created, one is for the EMR Studio workspace and the other one is for EMR Serverless applications. Besides, the VPC and the subnets for the EMR Studio will be tagged `{"Key": "for-use-with-amazon-emr-managed-policies", "Value": "true"}` via a custom resource. This is necessary for the [service role](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-service-role.html#emr-studio-service-role-instructions) of EMR Studio.
This construct is for analysts, data engineers, and anyone who wants to know how to process **Delta Lake data** with EMR serverless.
![cfn designer](./images/cfn-designer.png)
Expand All @@ -22,8 +22,9 @@ They build the construct via [cdkv2](https://docs.aws.amazon.com/cdk/v2/guide/ho
1. Your current identity has the `AdministratorAccess` power.
2. [An IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html) named `Administrator` with the `AdministratorAccess` power.
* This is related to the Portfolio of AWS Service Catalog created by the construct, which is required for [EMR cluster tempaltes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-cluster-templates.html).
* I'm still thinking whether I should leave it as a choice (in the construct) or create for you directly.
* You can choose whatsoever identity you wish to associate with the Product in the Porfolio for creating an EMR cluster via cluster tempalte. Check `serviceCatalogProps` in the `EmrServerless` construct for detail, otherwise, the IAM user mentioned above will be chosen to set up with the Product.
3. Choose proper subnet (IDs) from the default VPC, other than which you can choose your destined VPC, for the `EmrServerless` construct.
* You gotta check security issue yourself if you choose an alternative VPC. In this construct, the default VPC is set and for the quickiest deployment, you select proper subnets (IDs) from you default VPC and deploy it.
# Before deployment
You might want to execute the following command.
```sh
Expand All @@ -36,7 +37,7 @@ cdk bootstrap aws://${AWS_ACCOUNT_ID}/${AWS_REGION} --profile ${PROFILE_NAME}
#!/usr/bin/env node
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { EmrServerless } from '../../emrserverless';
import { EmrServerless } from 'cdk-emrserverless-with-delta-lake';

class TypescriptStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
Expand Down Expand Up @@ -67,7 +68,7 @@ Promise me, darling, make advantage on the CloudFormation outputs. All you need
export SERVERLESS_BUCKET_NAME="${copy-paste-thank-you}"
export DELTA_LAKE_SCRIPT_NAME="delta-lake-demo"
```
2. Copy partial NYC-taxi data into the EMR Serverless bucket.
2. **Copy partial NYC-taxi data into the EMR Serverless bucket.**
```sh
aws s3 cp s3://nyc-tlc/trip\ data/ s3://${SERVERLESS_BUCKET_NAME}/nyc-taxi/ --exclude "*" --include "yellow_tripdata_2021-*.parquet" --recursive --profile ${PROFILE_NAME}
```
Expand Down Expand Up @@ -99,9 +100,11 @@ Promise me, darling, make advantage on the CloudFormation outputs. All you need
# reads a Delta table and outputs to target S3 bucket
spark.read.format("delta").load(url).show()
# The source for the second Delta table.
base = spark.read.parquet(
"s3://${SERVERLESS_BUCKET_NAME}/nyc-taxi/*.parquet")
# The sceond Delta table, oh ya.
base.write.format("delta") \\
.mode("overwrite") \\
.save("s3://${SERVERLESS_BUCKET_NAME}/emr-serverless-spark/delta-lake/nyx-tlc-2021")
Expand All @@ -115,7 +118,7 @@ Promise me, darling, make advantage on the CloudFormation outputs. All you need
# download jars and upload them
DELTA_VERSION="1.2.0"
DELTA_LAKE_CORE="delta-core_2.12-${DELTA_VERSION}.jar"
DELTA_LAKE_STORAGE="delta-storage-${${DELTA_VERSION}}.jar"
DELTA_LAKE_STORAGE="delta-storage-${DELTA_VERSION}.jar"
curl https://repo1.maven.org/maven2/io/delta/delta-core_2.12/${DELTA_VERSION}/${DELTA_LAKE_CORE} --output ${DELTA_LAKE_CORE}
curl https://repo1.maven.org/maven2/io/delta/delta-storage/${DELTA_VERSION}/${DELTA_LAKE_STORAGE} --output ${DELTA_LAKE_STORAGE}
aws s3 mv ${DELTA_LAKE_CORE} s3://${SERVERLESS_BUCKET_NAME}/jars/${${DELTA_LAKE_CORE}} --profile ${PROFILE_NAME}
Expand Down
Binary file added images/high level architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
65 changes: 60 additions & 5 deletions src/emr-studio-cluster-templates.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,18 @@ export interface EmrStudioDeveloperStackProps {
* @default - 'scott.hsieh'
*/
readonly providerName?: string;
/**
* an IAM group you wish to associate with the Portfolio for EMR cluster template.
*/
readonly group?: iam.IGroup;
/**
* an IAM role you wish to associate with the Portfolio for EMR cluster template.
*/
readonly role?: iam.IRole;
/**
* an IAM user you wish to associate with the Portfolio for EMR cluster template.
*/
readonly user?: iam.IUser;
}

/**
Expand All @@ -35,10 +47,19 @@ export class EmrStudioDeveloperStack extends Construct {
public readonly product: servicecatalog.Product;
constructor(scope: Construct, name: string, props?: EmrStudioDeveloperStackProps) {
super(scope, name);
if (props === undefined) {
console.log('`providerName` is not defined, therefore, the default value \'scott.hsieh\' will be set.');
if (props?.providerName === undefined) {
console.log('`providerName` for the `EmrStudioDeveloperStack` construct is not defined, therefore, the default value \'scott.hsieh\' will be set.');
}
if (props?.user === undefined) {
console.log('`user` for the `EmrStudioDeveloperStack` construct is not defined, therefore, the default value, an IAM user named \'Administrator\' with the `AdministratorAccess` power, will be set.');
}
const providerName: string = (props !== undefined) ? props.providerName! : 'scott.hsieh';
if (props?.role === undefined) {
console.log('`role` for the `EmrStudioDeveloperStack` construct is not defined, therefore, the default value, an IAM user named \'Administrator\' with the `AdministratorAccess` power, will be set.');
}
if (props?.group === undefined) {
console.log('`group` for the `EmrStudioDeveloperStack` construct is not defined, therefore, the default value, an IAM user named \'Administrator\' with the `AdministratorAccess` power, will be set.');
}
const providerName: string = (props?.providerName !== undefined) ? props.providerName : 'scott.hsieh';

this.portfolio = new servicecatalog.Portfolio(this, 'Portfolio', {
displayName: 'EMR Studio Developers Stack',
Expand All @@ -59,13 +80,46 @@ export class EmrStudioDeveloperStack extends Construct {
],
});
this.portfolio.addProduct(this.product);
this.portfolio.giveAccessToUser(iam.User.fromUserName(this, 'AdminUser', 'Administrator'));
if (props !== undefined) {
if (props!.group !== undefined) {
this.portfolio.giveAccessToGroup(props!.group);
}
if (props!.role !== undefined) {
this.portfolio.giveAccessToRole(props!.role);
}
if (props!.user !== undefined) {
this.portfolio.giveAccessToUser(props!.user);
}
} else {
this.portfolio.giveAccessToUser(iam.User.fromUserName(this, 'AdminUser', 'Administrator'));
}
new cdk.CfnOutput(this, 'OEmrStudioPortfolioArn', { value: this.portfolio.portfolioArn, description: 'The ARN of the portfolio.' });
new cdk.CfnOutput(this, 'OEmrStudioPortfolioProductArn', { value: this.product.productArn, description: 'The ARN of the product.' });
}
}


/**
* Creates a CloudFormation template which will be a Product under a Portfolio of AWS Service Catalog. This is for creating an EMR cluster via cluster template in the EMR Studio, created by the `EmrServerless` construct, on the AWS Console.
*
* And you don't have control via the `EmrServerless` construct by now. The documentation is for you to grasp the architecture of the `EmrServerless` more easily.
*
* For detail, please refer to [Create AWS CloudFormation templates for Amazon EMR Studio](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-cluster-templates.html).
*
* ```ts
* const product = new servicecatalog.CloudFormationProduct(this, 'MyFirstProduct', {
* productName: 'EMR_6.6.0',
* owner: 'scott.hsieh',
* description: 'EMR cluster with 6.6.0 version',
* productVersions: [
* {
* productVersionName: 'v1',
* validateTemplate: true,
* cloudFormationTemplate: servicecatalog.CloudFormationTemplate.fromProductStack(new EmrClusterTemplateStack(this, 'EmrStudio')),
* },
* ],
* });
* ```
*/
export class EmrClusterTemplateStack extends servicecatalog.ProductStack {
constructor(scope: Construct, id: string) {
super(scope, id);
Expand All @@ -80,6 +134,7 @@ export class EmrClusterTemplateStack extends servicecatalog.ProductStack {
const ec2SubnetId = new cdk.CfnParameter(this, 'Ec2SubnetId', {
type: 'String',
default: 'subnet-3571a36c',
description: 'Buddy, the default value is one of the subnets in scott.hsieh\'s account, you gotta type your own.',
});
const emrRelease = new cdk.CfnParameter(this, 'EmrRelease', {
type: 'String',
Expand Down
15 changes: 13 additions & 2 deletions src/emr-studio.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import * as logs from 'aws-cdk-lib/aws-logs';
import * as cr from 'aws-cdk-lib/custom-resources';
import { Construct } from 'constructs';
import { WorkSpaceBucket } from './buckets';
import { EmrStudioDeveloperStack } from './emr-studio-cluster-templates';
import { EmrStudioDeveloperStack, EmrStudioDeveloperStackProps } from './emr-studio-cluster-templates';
import { EmrStudioEngineSecurityGroup, EmrStudioWorkspaceSecurityGroup } from './emr-studio-sgs';

/**
Expand Down Expand Up @@ -98,6 +98,12 @@ export interface EmrStudioProps {
* @link https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-studio-user-permissions.html
*/
readonly userRoleArn?: string;
/**
* Options for which kind of identity will be associated with the Product of the Porfolio in AWS Service Catalog for EMR cluster templates.
*
* You can choose either an IAM group, IAM role, or IAM user. If you leave it empty, an IAM user named `Administrator` with the `AdministratorAccess` power needs to be created first.
*/
readonly serviceCatalogProps?: EmrStudioDeveloperStackProps;
}

/**
Expand Down Expand Up @@ -179,7 +185,12 @@ export class EmrStudio extends Construct {
}],
userRole: (props.authMode == StudioAuthMode.AWS_SSO) ? props.userRoleArn : undefined,
});
new EmrStudioDeveloperStack(this, 'ClusterTempalte');
new EmrStudioDeveloperStack(this, 'ClusterTempalte', {
providerName: props.serviceCatalogProps?.providerName,
group: props.serviceCatalogProps?.group,
user: props.serviceCatalogProps?.user,
role: props.serviceCatalogProps?.role,
});

new cdk.CfnOutput(this, 'EmrStudioArn', { value: cdk.stringToCloudFormation(this.entity.getAtt('Arn')), description: 'The ARN of the EMR Studio' });
new cdk.CfnOutput(this, 'EmrStudioId', { value: cdk.stringToCloudFormation(this.entity.getAtt('StudioId')), description: 'The ID of the Amazon EMR Studio.' });
Expand Down
Loading

0 comments on commit f83c4a6

Please sign in to comment.