For more information please refer to the main Apiary project page.
Name | Description | Type | Default | Required |
---|---|---|---|---|
alluxio_endpoints | List of Alluxio endpoints(map of root url and s3 buckets) used to replace s3 paths with alluxio paths. See section Usage |
list | <list> |
no |
aws_region | AWS region to use for resources. | string | - | yes |
bastion_ssh_key_secret_name | Secret name in AWS Secrets Manager which stores the private key used to log in to bastions. The secret's key should be private_key and the value should be stored as a base64 encoded string. Max character limit for a secret's value is 4096. |
string | `` | no |
cpu | The number of CPU units to reserve for the Waggle Dance container. Valid values can be 256, 512, 1024, 2048 and 4096. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html | string | 1024 |
no |
cpu_limit | The number of CPU limit units to reserve for the Waggle Dance container. Valid values can be 256, 512, 1024, 2048 and 4096. It will use cpu * 1.25 if not specified. Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html |
string | 1024 |
no |
cpu_scale_in_cooldown | Cool down time(seconds) of scale in task by cpu usage | number | 300 | no |
cpu_scale_out_cooldown | Cool down time(seconds) of scale out task by cpu usage | number | 120 | no |
default_latency | Latency used for other (not primary) metastores that don't override it in their own configurations. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | 0 |
no |
primary_metastore_latency | Latency used for the primary metastores. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | 0 |
no |
docker_image | Full path Waggle Dance Docker image. | string | - | yes |
docker_registry_auth_secret_name | Docker Registry authentication SecretManager secret name. | string | `` | no |
docker_version | Waggle Dance Docker image version. | string | - | yes |
domain_extension | Domain name to use for Route 53 entry and service discovery. | string | lcl |
no |
enable_invocation_logs | Option to enable invocation logging in log4j. By default only slow (1 minute+) invocations are logged. | bool | false |
no |
enable_remote_metastore_dns | Option to enable creating DNS records for remote metastores. | string | `` | no |
enable_autoscaling | Enable k8s horizontal pod autoscaling. | bool | false |
no |
graphite_host | Graphite server configured in Waggle Dance to send metrics to. | string | localhost |
no |
graphite_port | Graphite server port. | string | 2003 |
no |
graphite_prefix | Prefix addded to all metrics sent to Graphite from this Waggle Dance instance. | string | waggle-dance |
no |
ingress_cidr | Generally allowed ingress CIDR list. | list | - | yes |
instance_name | Waggle Dance instance name to identify resources in multi-instance deployments. | string | `` | no |
k8s_namespace | K8s namespace to create waggle-dance deployment. | string | `` | no |
k8s_docker_registry_secret | Docker Registry authentication K8s secret name. | string | `` | no |
k8s_replica_count | Initial Number of k8s pod replicas to create. | number | 3 |
no |
k8s_max_replica_count | Max Number of k8s pod replicas to create. | number | 10 |
no |
k8s_dns_policy | DNS policy for the Waggledance Kubernetes deployment. Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default', or 'None'. | string | ClusterFirst |
no |
k8s_dns_config | DNS configuration for the Waggledance Kubernetes deployment. | object | - | no |
k8s_svc_spec | Waggledance Kubernetes service settings. All inner fields are optional and if unset the kubernetes default values are applied. | object | - |
no |
k8s_svc_annotations | Custom annotations for the Waggledance Kubernetes service.. | map(string) | "service.beta.kubernetes.io/aws-load-balancer-internal" = "true" "service.beta.kubernetes.io/aws-load-balancer-type" = "nlb" |
no |
local_metastores | List of federated Metastore endpoints directly accessible on the local network. See section local_metastores for more info. |
list | <list> |
no |
memory | The amount of memory (in MiB) used to allocate for the Waggle Dance container. Valid values: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html | string | 4096 |
no |
memory_limit | The amount of memory limit (in MiB) used to allocate for the Waggle Dance container, it will use memory * 1.25 if the limit is not specified. Valid values: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html |
string | null |
no |
primary_metastore_access_type | Primary Hive Metastore access control type. | string | READ_AND_WRITE_ON_DATABASE_WHITELIST |
no |
primary_metastore_host | Primary Hive Metastore hostname configured in Waggle Dance. | string | localhost |
no |
primary_metastore_port | Primary Hive Metastore port | string | 9083 |
no |
primary_metastore_glue_account_id | Primary metastore Glue AWS account id, optional. Use with primary_metastore_glue_endpoint and instead of primary_metastore_host/primary_metastore_port |
string | `` | no |
primary_metastore_glue_endpoint | Primary metastore Glue endpoint glue.us-east-1.amazonaws.com , optional. Use with primary_metastore_glue_account_id and instead of primary_metastore_host and primary_metastore_port |
string | `` | no |
primary_metastore_whitelist | List of Hive databases to whitelist on primary Metastore. | list | <list> |
no |
primary_metastore_mapped_databases | List of Hive databases mapped from primary Metastore. | list | <list> |
no |
primary_metastore_read_only_host | Primary Hive Metastore READ ONLY hostname configured in Waggle Dance. Optional. | string | `` | no |
primary_metastore_read_only_port | Primary Hive Metastore READ ONLY port configured in Waggle Dance. Optional. | string | 9083 |
no |
remote_metastores | List of VPC endpoint services to federate Metastores in other accounts. See section remote_metastores for more info. |
list | <list> |
no |
remote_region_metastores | List of VPC endpoint services to federate Metastores in other region,other accounts. The actual data from tables in these metastores can be accessed using Alluxio caching instead of reading the data from S3 directly. See section remote_region_metastores for more info. |
list | <list> |
no |
secondary_vpcs | List of VPCs to associate with Service Discovery namespace. | list | <list> |
no |
ssh_metastores | List of federated Metastores to connect to over SSH via bastion. See section ssh_metastores for more info. |
list | <list> |
no |
subnets | ECS container subnets. | list | - | yes |
tags | A map of tags to apply to resources. | map | <map> |
no |
vpc_id | VPC ID. | string | - | yes |
wd_ecs_task_count | Number of ECS tasks to create. | string | 1 |
no |
wd_ecs_max_task_count | Max Number of ECS tasks to create. | string | 10 |
no |
wd_instance_type | Waggle Dance instance type, possible values: ecs ,k8s . |
string | ecs |
no |
wd_target_cpu_percentage | Waggle Dance autoscaling threshold for CPU target usage. | number | 60 |
no |
waggledance_version | Waggle Dance version to install on EC2 nodes | string | 3.3.2 |
no |
root_vol_type | Waggle Dance EC2 root volume type. | string | gp2 |
no |
root_vol_size | Waggle Dance EC2 root volume size. | string | 10 |
no |
enable_query_functions_across_all_metastores | This controls the thrift call for get_all_functions . It is generally used to initialize a client and get built-in functions and registered UDF's from a metastore. Setting this to false is more performant as WD then only gets the functions from the primary metastore. However, setting this to true will collate results by calling get_all_functions from all configured metastores. This could be potentially slow if some of the metastores are slow to respond. If all the metastores configured are of the same version and no additional UDF's are installed, then WD gets the same functions back so it's not very useful to call this across metastores. For backwards compatibility, this property can be set to true . Further read: https://github.com/ExpediaGroup/waggle-dance#server |
bool | false | no |
enable_tcp_keepalive | tcp_keepalive settings on the Waggledance pods. To use this you need to enable the ability to cahnge sysctl settings on your kubernetes cluster. For EKS you need to allow this on your cluster (https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ check EKS version for details). If your EKS version is below 1.24 you need to create a PodSecurityPolicy allowing the following sysctls "net.ipv4.tcp_keepalive_time", "net.ipv4.tcp_keepalive_intvl","net.ipv4.tcp_keepalive_probes" and a ClusterRole + Rolebinding for the service account running the HMS pods or all services accounts in the namespace where Apiary is running so that kubernetes can apply the tcp)keepalive configuration. For EKS 1.25 and above check this https://kubernetes.io/blog/2022/08/23/kubernetes-v1-25-release/#pod-security-changes. Also see tcp_keepalive_* variables. | bool | false | no |
tcp_keepalive_time | Sets net.ipv4.tcp_keepalive_time (seconds), currently only supported in ECS. | number | 200 |
no |
tcp_keepalive_intvl | Sets net.ipv4.tcp_keepalive_intvl (seconds), currently only supported in ECS. | number | 30 |
no |
tcp_keepalive_probes | Sets net.ipv4.tcp_keepalive_probes (seconds), currently only supported in ECS. | number | 2 |
no |
datadog_key_secret_name | Name of the secret containing the DataDog API key. This needs to be created manually in AWS secrets manager. | string | no | |
datadog_agent_version | Version of the Datadog Agent running in the ECS cluster. | string | 7.46.0-jmx |
no |
include_datadog_agent | Whether to include the datadog-agent container alongside Waggledance. | string | bool | no |
metrics_port | The port on which the WaggleDance application initiates. Additionally, it serves as the port from which we parse metrics. | string | 18000 |
yes |
extended_server_config | Extended waggle-dance-server.yml configuration for Waggle Dance (see Waggle Dance README for all options). String will be yamlencoded. | string | no |
Example module invocation:
module "apiary-waggledance" {
source = "git::https://github.com/ExpediaGroup/apiary-federation.git?ref=master"
#required for creating VPC endpoints in remote region
providers = {
aws.remote = aws.remote
}
instance_name = "waggledance-test"
wd_ecs_task_count = "1"
aws_region = "us-west-2"
vpc_id = "vpc-1"
subnets = ["subnet-1", "subnet-2"]
tags = {
Name = "Apiary-WaggleDance"
Team = "Operations"
}
ingress_cidr = ["10.0.0.0/8", "172.16.0.0/12"]
docker_image = "your.docker.repo/apiary-waggledance"
docker_version = "latest"
primary_metastore_host = "primary-metastore.yourdomain.com"
primary_metastore_whitelist = ["test_.*", "team_.*"]
primary_metastore_latency = 1000
default_latency = 100
remote_metastores = [
{
endpoint = "com.amazonaws.vpce.us-west-2.vpce-svc-1"
port = "9083"
prefix = "metastore1"
mapped-databases = "default,test"
database-name-mapping = "test:test_alias,default:default_alias"
writable-whitelist = "test"
latency = 5000
},
{
endpoint = "com.amazonaws.vpce.us-east-1.vpce-svc-2"
port = "9083"
prefix = "metastore2"
subnets = "subnet-3"
mapped-databases = "test"
enabled = false //option to enable/disable metastore in waggle-dance without removing vpc endpoint.
},
]
remote_region_metastores = [
{
endpoint = "com.amazonaws.vpce.us-west-2.vpce-svc-1"
port = "9083"
prefix = "metastore1"
mapped-databases = "default,test"
database-name-mapping = "test:test_alias,default:default_alias"
writable-whitelist = "test"
vpc_id = "vpc-123456"
subnets = "subnet-1,subnet-2"
security_group_id = "sg1"
},
]
alluxio_endpoints = [
{
root_url = "alluxio://alluxio1:19998/"
s3_buckets = "bucket1,bucket2"
}
,
{
root_url = "alluxio://alluxio2:19998/"
s3_buckets = "bucket3,bucket4"
}
]
}
A list of maps. Each map entry describes a federated metastore server directly accessible on the local network.
An example entry looks like:
local_metastores = [
{
host = "hms-readonly.metastore.svc.cluster.local"
latency = 2000
port = "9083"
prefix = "local1"
mapped-databases = "default,test"
database-name-mapping = "test:test_alias,default:default_alias"
mapped-tables = "test:test_table1,test_table1;default:default_table1.*,default_table2"
writable-whitelist = "test"
}
]
local_metastores
map entry fields:
Name | Description | Type | Default | Required |
---|---|---|---|---|
host | Host name of the Hive metastore server on the local network. | string | - | yes |
latency | Latency used for this metastore. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | var.default_latency |
no |
port | IP port that the Thrift server of the Hive metastore listens on. | string | "9083" |
no |
prefix | Prefix added to the database names from this metastore. Must be unique among all local, remote, and SSH federated metastores in this Waggle Dance instance. | string | - | yes |
mapped-databases | Comma-separated list of databases from this metastore to expose to federation. If not specified, all databases are exposed. | string | "" |
no |
mapped-tables | Semicolon-separated/comma-separated list of databases and DB tables from this metastore to expose to federation. If not specified, all tables for each database are exposed. See Waggle Dance Mapped Tables for more information. | string | "" |
no |
database-name-mapping | Comma-separated list of <database>:<alias> key/value pairs to add aliases for the given databases. Default is no aliases. This is used primarily in migration scenarios where a database has been renamed/relocated. See Waggle Dance Database Name Mapping for more information. |
string | "" |
no |
writable-whitelist | Comma-separated list of databases from this metastore that can be in read-write mode. If not specified, all databases are read-only. Use .* to allow all databases to be written to. |
string | "" |
no |
See Waggle Dance README for more information on all these parameters.
A list of maps. Each map entry describes a federated metastore endpoint accessible via an AWS VPC endpoint.
An example entry looks like:
remote_metastores = [
{
endpoint = "com.amazonaws.vpce.us-west-2.vpce-svc-1"
port = "9083"
prefix = "remote1"
mapped-databases = "default,test"
database-name-mapping = "test:test_alias,default:default_alias"
mapped-tables = "test:test_table1,test_table1;default:default_table1.*,default_table2"
writable-whitelist = ".*"
}
]
remote_metastores
map entry fields:
Name | Description | Type | Default | Required |
---|---|---|---|---|
endpoint | AWS VPC endpoint name that is connected to the remote Hive metastore. | string | - | yes |
latency | Latency used for this metastore. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | var.default_latency |
no |
port | IP port that the Thrift server of the remote Hive metastore listens on. | string | "9083" |
no |
prefix | Prefix added to the database names from this metastore. Must be unique among all local, remote, and SSH federated metastores in this Waggle Dance instance. | string | - | yes |
mapped-databases | Comma-separated list of databases from this metastore to expose to federation. If not specified, all databases are exposed. | string | "" |
no |
mapped-tables | Semicolon-separated/comma-separated list of databases and DB tables from this metastore to expose to federation. If not specified, all tables for each database are exposed. See Waggle Dance Mapped Tables for more information. | string | "" |
no |
database-name-mapping | Comma-separated list of <database>:<alias> key/value pairs to add aliases for the given databases. Default is no aliases. This is used primarily in migration scenarios where a database has been renamed/relocated. See Waggle Dance Database Name Mapping for more information. |
string | "" |
no |
writable-whitelist | Comma-separated list of databases from this metastore that can be in read-write mode. If not specified, all databases are read-only. Use .* to allow all databases to be written to. |
string | "" |
no |
See Waggle Dance README for more information on all these parameters.
A list of maps. Each map entry describes a federated metastore endpoint accessible via an AWS VPC endpoint. The actual data for these metastores will be accessed using Alluxio caching instead of reading the data from S3 directly.
An example entry looks like:
remote_region_metastores = [
{
endpoint = "com.amazonaws.vpce.us-west-2.vpce-svc-1"
port = "9083"
prefix = "remote1"
mapped-databases = "default,test"
mapped-tables = "test:test_table1,test_table1;default:default_table1.*,default_table2"
database-name-mapping = "test:test_alias,default:default_alias"
writable-whitelist = ".*"
vpc_id = "vpc-123456"
subnets = "subnet-1,subnet-2"
security_group_id = "sg1
}
]
remote_region_metastores
map entry fields:
Name | Description | Type | Default | Required |
---|---|---|---|---|
endpoint | AWS VPC endpoint service name that is connected to the remote Hive metastore. | string | - | yes |
latency | Latency used for this metastore. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | var.default_latency |
no |
port | IP port that the Thrift server of the remote Hive metastore listens on. | string | "9083" |
no |
prefix | Prefix added to the database names from this metastore. Must be unique among all local, remote, and SSH federated metastores in this Waggle Dance instance. | string | - | yes |
mapped-databases | Comma-separated list of databases from this metastore to expose to federation. If not specified, all databases are exposed. | string | "" |
no |
mapped-tables | Semicolon-separated/comma-separated list of databases and DB tables from this metastore to expose to federation. If not specified, all tables for each database are exposed. See Waggle Dance Mapped Tables for more information. | string | "" |
no |
database-name-mapping | Comma-separated list of <database>:<alias> key/value pairs to add aliases for the given databases. Default is no aliases. This is used primarily in migration scenarios where a database has been renamed/relocated. See Waggle Dance Database Name Mapping for more information. |
string | "" |
no |
writable-whitelist | Comma-separated list of databases from this metastore that can be in read-write mode. If not specified, all databases are read-only. Use .* to allow all databases to be written to. |
string | "" |
no |
vpc_id | Remote region AWS VPC id. | string | - | yes |
subnets | AWS VPC subnets in remote region. | string | - | yes |
security_group_id | AWS EC2 security group in remote region. | string | - | yes |
See Waggle Dance README for more information on all these parameters.
A list of maps. Each map entry describes a federated metastore endpoint connected via an SSH bastion host.
An example entry looks like:
ssh_metastores = [
{
metastore-host = "com.amazonaws.vpce.us-west-2.vpce-svc-1"
port = "9083"
bastion-host = "bastion.remote-account.com"
user = "bastion-user"
timeout = "30000"
prefix = "ssh_metastore1"
mapped-databases = "default,test"
database-name-mapping = "test:test_alias,default:default_alias"
mapped-tables = "test:test_table1,test_table1;default:default_table1.*,default_table2"
}
]
ssh_metastores
map entry fields:
Name | Description | Type | Default | Required |
---|---|---|---|---|
latency | Latency used for this metastore. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | var.default_latency |
no |
metastore-host | Host name of the Hive metastore that can be resolved/reached from the bastion host. | string | - | yes |
port | IP port that the Thrift server of the remote Hive metastore listens on. | string | "9083" |
no |
bastion-host | Host name of the bastion host. | string | - | yes |
user | User name what will login to the bastion host. | string | - | yes |
timeout | The SSH session timeout in milliseconds, 0 means no timeout. Default is 60000 milliseconds, i.e. 1 minute. | string | "60000" |
no |
prefix | Prefix added to the database names from this metastore. Must be unique among all local, remote, and SSH federated metastores in this Waggle Dance instance. | string | - | yes |
mapped-databases | Comma-separated list of databases from this metastore to expose to federation. If not specified, all databases are exposed. | string | "" |
no |
mapped-tables | Semicolon-separated/comma-separated list of databases and DB tables from this metastore to expose to federation. If not specified, all tables for each database are exposed. See Waggle Dance Mapped Tables for more information. | string | "" |
no |
database-name-mapping | Comma-separated list of <database>:<alias> key/value pairs to add aliases for the given databases. Default is no aliases. This is used primarily in migration scenarios where a database has been renamed/relocated. See Waggle Dance Database Name Mapping for more information. |
string | "" |
no |
writable-whitelist | Comma-separated list of databases from this metastore that can be in read-write mode. If not specified, all databases are read-only. Use .* to allow all databases to be written to. |
string | "" |
no |
See Waggle Dance README for more information on all these parameters.
A list of maps. Each map entry describes a federated metastore endpoint connected to AWS Glue datacatalog.
An example entry looks like:
glue_metastores = [
{
glue-account-id = "123456789012"
glue-endpoint = "glue.us-east-1.amazonaws.com"
prefix = "glue_metastore1"
mapped-databases = "default,test"
}
]
glue_metastores
map entry fields:
Name | Description | Type | Default | Required |
---|---|---|---|---|
latency | Latency used for this metastore. See latency parameter in https://github.com/ExpediaGroup/waggle-dance/blob/main/README.md. |
number | var.default_latency |
no |
glue-account-id | Glue AWS account id. | string | - | yes |
glue-endpoint | Glue endpoint 'glue.us-east-1.amazonaws.com'. | string | - | yes |
prefix | Prefix added to the database names from this metastore. Must be unique among all local, remote, and SSH federated metastores in this Waggle Dance instance. | string | - | yes |
mapped-databases | Comma-separated list of databases from this metastore to expose to federation. If not specified, all databases are exposed. | string | "" |
no |
mapped-tables | Semicolon-separated/comma-separated list of databases and DB tables from this metastore to expose to federation. If not specified, all tables for each database are exposed. See Waggle Dance Mapped Tables for more information. | string | "" |
no |
database-name-mapping | Comma-separated list of <database>:<alias> key/value pairs to add aliases for the given databases. Default is no aliases. This is used primarily in migration scenarios where a database has been renamed/relocated. See Waggle Dance Database Name Mapping for more information. |
string | "" |
no |
writable-whitelist | Comma-separated list of databases from this metastore that can be in read-write mode. If not specified, all databases are read-only. Use .* to allow all databases to be written to. |
string | "" |
no |
See Waggle Dance README for more information on all these parameters.
If you would like to ask any questions about or discuss Apiary please join our mailing list at
https://groups.google.com/forum/#!forum/apiary-user
This project is available under the Apache 2.0 License.
Copyright 2018-2019 Expedia, Inc.