Skip to content

unsplash/terraform-aws-collector-kinesis-ec2

 
 

Repository files navigation

Release CI License Registry Source

terraform-aws-collector-kinesis-ec2

A Terraform module which deploys the Snowplow Stream Collector on EC2. If you want to use a custom AMI for this deployment you will need to ensure it is based on top of Amazon Linux 2.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

A Collector requires two output Kinesis Streams and a Load Balancer which is deployed upstream. The Load Balancer ensures we can easily configure TLS termination later in the setup and provides a simple mechanism for setting up DNS (over single EC2 instances with EIP's).

module "raw_stream" {
  source  = "snowplow-devops/kinesis-stream/aws"
  version = "0.2.0"

  name = "raw-stream"
}

module "bad_1_stream" {
  source  = "snowplow-devops/kinesis-stream/aws"
  version = "0.2.0"

  name = "bad-1-stream"
}

module "collector_lb" {
  source  = "snowplow-devops/alb/aws"
  version = "0.2.0"

  name              = "collector-lb"
  vpc_id            = var.vpc_id
  subnet_ids        = var.subnet_ids
  health_check_path = "/health"
}

module "collector_kinesis" {
  source = "snowplow-devops/collector-kinesis-ec2/aws"

  name               = "collector-server"
  vpc_id             = var.vpc_id
  subnet_ids         = var.subnet_ids
  collector_lb_sg_id = module.collector_lb.sg_id
  collector_lb_tg_id = module.collector_lb.tg_id
  ingress_port       = module.collector_lb.tg_egress_port
  good_stream_name   = module.raw_stream.name
  bad_stream_name    = module.bad_1_stream.name

  ssh_key_name     = "your-key-name"
  ssh_ip_allowlist = ["0.0.0.0/0"]
}

Requirements

Name Version
terraform >= 1.0.0
aws >= 3.72.0

Providers

Name Version
aws >= 3.72.0

Modules

Name Source Version
instance_type_metrics snowplow-devops/ec2-instance-type-metrics/aws 0.1.2
service snowplow-devops/service-ec2/aws 0.2.0
telemetry snowplow-devops/telemetry/snowplow 0.4.0

Resources

Name Type
aws_cloudwatch_log_group.log_group resource
aws_iam_instance_profile.instance_profile resource
aws_iam_policy.iam_policy resource
aws_iam_role.iam_role resource
aws_iam_role_policy_attachment.policy_attachment resource
aws_security_group.sg resource
aws_security_group_rule.egress_tcp_443 resource
aws_security_group_rule.egress_tcp_80 resource
aws_security_group_rule.egress_udp_123 resource
aws_security_group_rule.ingress_tcp_22 resource
aws_security_group_rule.ingress_tcp_webserver resource
aws_security_group_rule.lb_egress_tcp_webserver resource
aws_caller_identity.current data source
aws_region.current data source

Inputs

Name Description Type Default Required
bad_stream_name The name of the bad kinesis stream that the collector will insert data into string n/a yes
collector_lb_sg_id The ID of the load-balancer security group that sits upstream of the webserver string n/a yes
collector_lb_tg_id The ID of the load-balancer target group to direct traffic from the load-balancer to the webserver string n/a yes
good_stream_name The name of the good kinesis stream that the collector will insert data into string n/a yes
ingress_port The port that the collector will be bound to and expose over HTTP number n/a yes
name A name which will be pre-pended to the resources created string n/a yes
ssh_key_name The name of the preexisting SSH key-pair to attach to all EC2 nodes deployed string n/a yes
subnet_ids The list of at least two subnets in different availability zones to deploy the collector across list(string) n/a yes
vpc_id The VPC to deploy the collector within string n/a yes
amazon_linux_2_ami_id The AMI ID to use which must be based of of Amazon Linux 2; by default the latest community version is used string "" no
associate_public_ip_address Whether to assign a public ip address to this instance bool true no
byte_limit The amount of bytes to buffer events before pushing them to Kinesis number 1000000 no
cloudwatch_logs_enabled Whether application logs should be reported to CloudWatch bool true no
cloudwatch_logs_retention_days The length of time in days to retain logs for number 7 no
cookie_enabled Whether server side cookies are enabled or not bool true no
cookie_domain Optional first party cookie domain for the collector to set cookies on (e.g. acme.com) string "" no
custom_paths Optional custom paths that the collector will respond to, typical paths to override are '/com.snowplowanalytics.snowplow/tp2', '/com.snowplowanalytics.iglu/v1' and '/r/tp2'. e.g. { "/custom/path/" : "/com.snowplowanalytics.snowplow/tp2"} map(string) {} no
enable_auto_scaling Whether to enable auto-scaling policies for the service bool true no
iam_permissions_boundary The permissions boundary ARN to set on IAM roles created string "" no
instance_type The instance type to use string "t3a.micro" no
java_opts Custom JAVA Options string "-Dorg.slf4j.simpleLogger.defaultLogLevel=info -Dcom.amazonaws.sdk.disableCbor -XX:MinRAMPercentage=50 -XX:MaxRAMPercentage=75" no
max_size The maximum number of servers in this server-group number 2 no
min_size The minimum number of servers in this server-group number 1 no
record_limit The number of events to buffer before pushing them to Kinesis number 500 no
scale_down_cooldown_sec Time (in seconds) until another scale-down action can occur number 600 no
scale_down_cpu_threshold_percentage The average CPU percentage that we must be below to scale-down number 20 no
scale_down_eval_minutes The number of consecutive minutes that we must be below the threshold to scale-down number 60 no
scale_up_cooldown_sec Time (in seconds) until another scale-up action can occur number 180 no
scale_up_cpu_threshold_percentage The average CPU percentage that must be exceeded to scale-up number 60 no
scale_up_eval_minutes The number of consecutive minutes that the threshold must be breached to scale-up number 5 no
ssh_ip_allowlist The list of CIDR ranges to allow SSH traffic from list(any)
[
"0.0.0.0/0"
]
no
tags The tags to append to this resource map(string) {} no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
time_limit_ms The amount of time to buffer events before pushing them to Kinesis number 500 no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no

Outputs

Name Description
asg_id ID of the ASG
asg_name Name of the ASG
sg_id ID of the security group attached to the Collector Server node

Copyright and license

The Terraform AWS Collector Kinesis on EC2 project is Copyright 2021-2022 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Fork of snowplow-devops/terraform-aws-collector-kinesis-ec2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HCL 100.0%