Skip to content

Commit

Permalink
Non-filestore GCP terraform option (#119)
Browse files Browse the repository at this point in the history
  • Loading branch information
Secretions authored Jul 11, 2024
1 parent 673fcca commit 84c3e6f
Show file tree
Hide file tree
Showing 11 changed files with 125 additions and 3 deletions.
1 change: 0 additions & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ commands:
command: |
echo export TF_VAR_deploy_id=\"${WORKSPACE}\" >> $BASH_ENV
echo 'export TF_VAR_description="CircleCI Build for ${CIRCLE_PR_REPONAME}: ${CIRCLE_BUILD_URL}"' >> $BASH_ENV
echo 'export TF_VAR_filestore_disabled="true"' >> $BASH_ENV
echo 'export GOOGLE_CREDENTIALS="$CLOUDSDK_SERVICE_KEY"' >> $BASH_ENV
echo 'export WORKSPACE=gcp-gke-circleci-${CIRCLE_BUILD_NUM}' >> $BASH_ENV
install_prereqs:
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,12 @@ No modules.
| [google_artifact_registry_repository.domino](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/artifact_registry_repository) | resource |
| [google_artifact_registry_repository_iam_member.gcr](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/artifact_registry_repository_iam_member) | resource |
| [google_artifact_registry_repository_iam_member.platform](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/artifact_registry_repository_iam_member) | resource |
| [google_compute_disk.nfs](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_disk) | resource |
| [google_compute_firewall.iap_tcp_forwarding](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_firewall) | resource |
| [google_compute_firewall.master_webhooks](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_firewall) | resource |
| [google_compute_firewall.nfs](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_firewall) | resource |
| [google_compute_global_address.static_ip](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_global_address) | resource |
| [google_compute_instance.nfs](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance) | resource |
| [google_compute_network.vpc_network](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_network) | resource |
| [google_compute_router.router](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_router) | resource |
| [google_compute_router_nat.nat](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_router_nat) | resource |
Expand Down Expand Up @@ -161,7 +164,7 @@ No modules.
| <a name="input_namespaces"></a> [namespaces](#input\_namespaces) | Namespace that are used for generating the service account bindings | `object({ platform = string, compute = string })` | n/a | yes |
| <a name="input_node_pools"></a> [node\_pools](#input\_node\_pools) | GKE node pool params | <pre>object(<br> {<br> compute = object({<br> min_count = optional(number, 0)<br> max_count = optional(number, 10)<br> initial_count = optional(number, 1)<br> max_pods = optional(number, 30)<br> preemptible = optional(bool, false)<br> disk_size_gb = optional(number, 400)<br> image_type = optional(string, "COS_CONTAINERD")<br> instance_type = optional(string, "n2-highmem-8")<br> gpu_accelerator = optional(string, "")<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "default"<br> })<br> taints = optional(list(string), [])<br> node_locations = optional(list(string), [])<br> }),<br> platform = object({<br> min_count = optional(number, 1)<br> max_count = optional(number, 5)<br> initial_count = optional(number, 1)<br> max_pods = optional(number, 60)<br> preemptible = optional(bool, false)<br> disk_size_gb = optional(number, 100)<br> image_type = optional(string, "COS_CONTAINERD")<br> instance_type = optional(string, "n2-standard-8")<br> gpu_accelerator = optional(string, "")<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "platform"<br> })<br> taints = optional(list(string), [])<br> node_locations = optional(list(string), [])<br> }),<br> gpu = object({<br> min_count = optional(number, 0)<br> max_count = optional(number, 2)<br> initial_count = optional(number, 0)<br> max_pods = optional(number, 30)<br> preemptible = optional(bool, false)<br> disk_size_gb = optional(number, 400)<br> image_type = optional(string, "COS_CONTAINERD")<br> instance_type = optional(string, "n1-highmem-8")<br> gpu_accelerator = optional(string, "nvidia-tesla-p100")<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "default-gpu"<br> "nvidia.com/gpu" = "true"<br> })<br> taints = optional(list(string), [<br> "nvidia.com/gpu=true:NoExecute"<br> ])<br> node_locations = optional(list(string), [])<br> })<br> })</pre> | <pre>{<br> "compute": {},<br> "gpu": {},<br> "platform": {}<br>}</pre> | no |
| <a name="input_project"></a> [project](#input\_project) | GCP Project ID | `string` | `"domino-eng-platform-dev"` | no |
| <a name="input_storage"></a> [storage](#input\_storage) | storage = {<br> filestore = {<br> enabled = Provision a Filestore instance (mostly to avoid GCP Filestore API issues)<br> capacity\_gb = Filestore Instance size (GB) for the cluster NFS shared storage<br> }<br> gcs = {<br> force\_destroy\_on\_deletion = Toogle to allow recursive deletion of all objects in the bucket. if 'false' terraform will NOT be able to delete non-empty buckets.<br> } | <pre>object({<br> filestore = optional(object({<br> enabled = optional(bool, true)<br> capacity_gb = optional(number, 1024)<br> }), {}),<br> gcs = optional(object({<br> force_destroy_on_deletion = optional(bool, false)<br> }), {})<br> })</pre> | `{}` | no |
| <a name="input_storage"></a> [storage](#input\_storage) | storage = {<br> filestore = {<br> enabled = Provision a Filestore instance (for production installs)<br> capacity\_gb = Filestore Instance size (GB) for the cluster NFS shared storage<br> }<br> nfs\_instance = {<br> enabled = Provision an instance as an NFS server (to avoid filestore churn during testing)<br> capacity\_gb = NFS instance disk size<br> }<br> gcs = {<br> force\_destroy\_on\_deletion = Toogle to allow recursive deletion of all objects in the bucket. if 'false' terraform will NOT be able to delete non-empty buckets.<br> } | <pre>object({<br> filestore = optional(object({<br> enabled = optional(bool, true)<br> capacity_gb = optional(number, 1024)<br> }), {}),<br> nfs_instance = optional(object({<br> enabled = optional(bool, false)<br> capacity_gb = optional(number, 100)<br> }), {}),<br> gcs = optional(object({<br> force_destroy_on_deletion = optional(bool, false)<br> }), {})<br> })</pre> | `{}` | no |
| <a name="input_tags"></a> [tags](#input\_tags) | Deployment tags. | `map(string)` | `{}` | no |
## Outputs
Expand All @@ -173,6 +176,8 @@ No modules.
| <a name="output_dns"></a> [dns](#output\_dns) | The external (public) DNS name for the Domino UI |
| <a name="output_domino_artifact_repository"></a> [domino\_artifact\_repository](#output\_domino\_artifact\_repository) | Domino Google artifact repository |
| <a name="output_google_filestore_instance"></a> [google\_filestore\_instance](#output\_google\_filestore\_instance) | Domino Google Cloud Filestore instance, name and ip\_address |
| <a name="output_nfs_instance"></a> [nfs\_instance](#output\_nfs\_instance) | Domino Google Cloud Filestore instance, name and ip\_address |
| <a name="output_nfs_instance_ip"></a> [nfs\_instance\_ip](#output\_nfs\_instance\_ip) | NFS instance IP |
| <a name="output_project"></a> [project](#output\_project) | GCP project ID |
| <a name="output_region"></a> [region](#output\_region) | Region where the cluster is deployed derived from 'location' input variable |
| <a name="output_service_accounts"></a> [service\_accounts](#output\_service\_accounts) | GKE cluster Workload Identity namespace IAM service accounts |
Expand Down
1 change: 1 addition & 0 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ locals {
is_regional = length(split("-", var.location)) == 2
region = local.is_regional ? var.location : substr(var.location, 0, length(var.location) - 2)
zone = local.is_regional ? format("%s-a", var.location) : var.location
nfs_path = "/srv/domino"
}

provider "google" {
Expand Down
73 changes: 73 additions & 0 deletions nfs-instance.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# tfsec:ignore:google-compute-disk-encryption-customer-key
resource "google_compute_disk" "nfs" {
#checkov:skip=CKV_GCP_37:Avoid extra churn for testing-only instance
count = var.storage.nfs_instance.enabled ? 1 : 0

name = "${var.deploy_id}-nfs-data"
type = "pd-standard"
zone = local.zone
size = var.storage.nfs_instance.capacity_gb
}

# tfsec:ignore:google-compute-no-project-wide-ssh-keys
resource "google_compute_instance" "nfs" {
#checkov:skip=CKV_GCP_37:Avoid extra churn for testing-only instance
#checkov:skip=CKV_GCP_38:Avoid extra churn for testing-only instance
#checkov:skip=CKV_GCP_32:SSH is useful for troubleshooting, and this is for testing only
#checkov:skip=CKV_GCP_40:This is need for ssh
count = var.storage.nfs_instance.enabled ? 1 : 0

name = "${var.deploy_id}-nfs"
machine_type = "n2-standard-2"
zone = local.zone
allow_stopping_for_update = true

tags = ["iap-tcp-forwarding-allowed", "nfs-allowed"]

# tfsec:ignore:google-compute-vm-disk-encryption-customer-key
boot_disk {
initialize_params {
image = "debian-cloud/debian-12"
}
}

# tfsec:ignore:google-compute-vm-disk-encryption-customer-key
attached_disk {
source = google_compute_disk.nfs[0].self_link
device_name = "nfs"
}

network_interface {
network = google_compute_network.vpc_network.self_link
subnetwork = google_compute_subnetwork.default.self_link

# tfsec:ignore:google-compute-no-public-ip
access_config {
# Ephemeral public IP
}
}

shielded_instance_config {
enable_vtpm = true
}

metadata_startup_script = templatefile("${path.module}/templates/nfs-install.sh", { nfs_path = local.nfs_path })

lifecycle {
ignore_changes = [attached_disk]
}
}

resource "google_compute_firewall" "nfs" {
count = var.storage.nfs_instance.enabled ? 1 : 0
name = "${var.deploy_id}-nfs"
network = google_compute_network.vpc_network.name

allow {
protocol = "tcp"
ports = ["111", "2049", "20048"]
}

source_ranges = ["10.0.0.0/8"]
target_tags = ["nfs-allowed"]
}
1 change: 1 addition & 0 deletions node_pools.tf
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ resource "google_container_node_pool" "node_pools" {

tags = [
"iap-tcp-forwarding-allowed",
"nfs-allowed",
"domino-${each.key}-node"
]

Expand Down
12 changes: 12 additions & 0 deletions outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,15 @@ output "domino_artifact_repository" {
value = google_artifact_registry_repository.domino
description = "Domino Google artifact repository"
}

output "nfs_instance_ip" {
value = var.storage.nfs_instance.enabled ? google_compute_instance.nfs[0].network_interface[0].network_ip : ""
description = "NFS instance IP"
}
output "nfs_instance" {
value = {
nfs_path = var.storage.nfs_instance.enabled ? local.nfs_path : "",
ip_address = var.storage.nfs_instance.enabled ? google_compute_instance.nfs[0].network_interface[0].network_ip : "",
}
description = "Domino Google Cloud Filestore instance, name and ip_address"
}
13 changes: 13 additions & 0 deletions templates/nfs-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apt install -y nfs-kernel-server
mkdir -p ${nfs_path}
mkfs.ext4 /dev/disk/by-id/google-nfs
UUID=$(blkid /dev/disk/by-id/google-nfs -s UUID -o value)
echo UUID=$UUID ${nfs_path} ext4 defaults 0 2 >> /etc/fstab
systemctl daemon-reload
mount ${nfs_path}
chmod 777 ${nfs_path}
echo '${nfs_path} 10.0.0.0/255.0.0.0(rw,async,no_root_squash)' >> /etc/exports
systemctl enable nfs-kernel-server --now
# without this restart, exports isn't respected for some reason
sleep 5
/etc/init.d/nfs-kernel-server restart
1 change: 1 addition & 0 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ No resources.
|------|-------------|------|---------|:--------:|
| <a name="input_deploy_id"></a> [deploy\_id](#input\_deploy\_id) | deploy id | `string` | n/a | yes |
| <a name="input_filestore_enabled"></a> [filestore\_enabled](#input\_filestore\_enabled) | Do not provision a Filestore instance (mostly to avoid GCP Filestore API issues) | `bool` | `false` | no |
| <a name="input_nfs_instance_enabled"></a> [nfs\_instance\_enabled](#input\_nfs\_instance\_enabled) | Provision an NFS instance (for testing use only) | `bool` | `false` | no |

## Outputs

Expand Down
3 changes: 3 additions & 0 deletions tests/modules.tf.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
"storage": {
"filestore": {
"enabled": "${var.filestore_enabled}"
},
"nfs_instance": {
"enabled": "${var.nfs_instance_enabled}"
}
},
"namespaces": {
Expand Down
6 changes: 6 additions & 0 deletions tests/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,9 @@ variable "filestore_enabled" {
default = false
description = "Do not provision a Filestore instance (mostly to avoid GCP Filestore API issues)"
}

variable "nfs_instance_enabled" {
type = bool
default = false
description = "Provision an NFS instance (for testing use only)"
}
10 changes: 9 additions & 1 deletion variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,13 @@ variable "storage" {
description = <<EOF
storage = {
filestore = {
enabled = Provision a Filestore instance (mostly to avoid GCP Filestore API issues)
enabled = Provision a Filestore instance (for production installs)
capacity_gb = Filestore Instance size (GB) for the cluster NFS shared storage
}
nfs_instance = {
enabled = Provision an instance as an NFS server (to avoid filestore churn during testing)
capacity_gb = NFS instance disk size
}
gcs = {
force_destroy_on_deletion = Toogle to allow recursive deletion of all objects in the bucket. if 'false' terraform will NOT be able to delete non-empty buckets.
}
Expand All @@ -67,6 +71,10 @@ variable "storage" {
enabled = optional(bool, true)
capacity_gb = optional(number, 1024)
}), {}),
nfs_instance = optional(object({
enabled = optional(bool, false)
capacity_gb = optional(number, 100)
}), {}),
gcs = optional(object({
force_destroy_on_deletion = optional(bool, false)
}), {})
Expand Down

0 comments on commit 84c3e6f

Please sign in to comment.