- Access to the following Github repos:
- Read-only access to
aitutor-dev
buckets (for ML model export and Firestore import) - A new project for installation with Project Owner (
roles/owner
) - A domain or sub_domain that you control and can create DNS records for (not needed for CEs)
You'll need the following quotas in your preferred zone
- 48 vCPU
- 4 x T4 GPUs
Confirm with the Learning Platform team which release version of each repo to use.
Install the following tools:
You will use a provided Terraform module to perform the following:
- Bootstrap a project in your organization
- Create a Terraform service account and Terraform state bucket in the project for further Terraform scripts
From your workstation:
export PROJECT_ID=<your-project-id>
export DOMAIN_NAME=<your-org-domain>
export REGION=<your-region>
export ZONE=<your-zone>
export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)")
export BILLING_ACCOUNT="$(gcloud beta billing projects describe ${PROJECT_ID} | grep billingAccountName \
| tr / ' ' | cut -f3 -d' ')"
export CLP_VERSION=<clp_tag>
export DEMO_VERSION=<v2.0.0-beta12.7-demo>
git clone https://github.com/GPS-Solutions/cloud-learning-platform.git
cd cloud-learning-platform
git checkout $CLP_VERSION
cd terraform/stages/project_bootstrap/
Log in to your project:
gcloud auth login
You will need to also create ADC credentials pointed towards an org admin account for use by the Terraform client.
gcloud auth application-default login
And verify if they've taken:
cat ~/.config/gcloud/application_default_credentials.json
Set gcloud
to point to your project (if not already)
gcloud config list
gcloud config set project ${PROJECT_ID}
Run the following to set Terraform variables:
# Pass variables to terraform using environment prefix TF_VAR_
export TF_VAR_project_id=${PROJECT_ID}
export TF_VAR_billing_account=${BILLING_ACCOUNT}
export TF_VAR_region=${REGION}
export TF_VAR_zone=${ZONE}
export TF_VAR_bucket_region_or_multiregion="US"
export TF_VAR_org_domain_name=${DOMAIN_NAME}
export TF_VAR_add_project_owner=true
Now that you're logged in, initialize and run a terraform apply
command to see the expected changes. Inspect the changes before typing yes
when prompted.
terraform init
terraform apply
Ensure that a bucket with the same name as the project has been created:
gsutil ls -p $PROJECT_ID
There should also be a jump host VM in the project:
gcloud compute instances list
In this section you successfully created the following:
- A bucket to capture future terraform state and prepare for CI/CD
- A Terraform service account with the required permissions
- A jump host to perform the rest of the installation
gsutil cp ../project_bootstrap/terraform.tfstate gs://"${PROJECT_ID}"-tfstate/env/bootstrap/terraform.tfstate
gcloud compute instances update jump-host --deletion-protection --project="${PROJECT_ID}"
gcloud compute scp ../scripts/bastion_startup.sh jump-host:~ --zone=${ZONE} --tunnel-through-iap --project="${PROJECT_ID}"
gcloud compute ssh jump-host --zone=${ZONE} --tunnel-through-iap --project=${PROJECT_ID}
source ~/bastion_startup.sh
Preferred so that disconnected sessions are not lost (https://tmuxcheatsheet.com/). To re-connect tmux attach
tmux
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
git config --global credential.https://github.com.username "username"
git config --global credential.helper store
git clone https://github.com/GPS-Solutions/cloud-learning-platform.git
export PROJECT_ID=<your-project-id>
export LDAP=<your-ldap>
export GITHUB_ID=<your-github-id>
export REGION=<your-region>
export ZONE=<your-zone>
gcloud auth login
gcloud auth application-default login
This account is used for several reasons:
- Ensuring a consistent experience, as users coming to this process may have varying permissions
- Creating Firebase resources in Terraform requires the use of a Service Account because of API limitations
- Setting up CI/CD for this project to consume upstream changes is halfway done for you
export SA_KEY_FILE=~/clp-terraform-cicd-key.json
gcloud iam service-accounts keys create ${SA_KEY_FILE} \
--iam-account=terraform-cicd@${PROJECT_ID}.iam.gserviceaccount.com
export GOOGLE_APPLICATION_CREDENTIALS=${SA_KEY_FILE}
export TF_VAR_project_id=${PROJECT_ID}
export TF_VAR_region=${REGION}
export TF_VAR_firestore_region="us-central"
export TF_VAR_gke_cluster_zones=${ZONE}
export TF_VAR_github_owner=${GITHUB_ID}
export TF_VAR_api_domain="${PROJECT_ID}-api"
export TF_VAR_web_app_domain="${PROJECT_ID}"
export TF_VAR_ckt_app_domain="${PROJECT_ID}-ckt"
export TF_VAR_github_ref="refs/tags/${DEMO_VERSION}"
These variables have been defaulted for Argolis projects
export TF_VAR_cert_issuer_email="${LDAP}@google.com"
export TF_VAR_org_domain_name="${LDAP}.altostrat.com"
export TF_VAR_base_domain="cloudpssolutions.com"
export TF_VAR_ai_tutor_whitelist_domains="google.com"
export TF_VAR_ai_tutor_whitelist_emails="${LDAP}@google.com,admin@${LDAP}.altostrat.com"
export TF_VAR_ckt_whitelist_domains="google.com"
export TF_VAR_ckt_whitelist_emails="${LDAP}@google.com,admin@${LDAP}.altostrat.com"
Now change directories to demo_environment
and initialize the terraform module,
pushd cloud-learning-platform
git checkout "${CLP_VERSION}"
cd terraform/stages/demo_environment
terraform init -backend-config="bucket=${PROJECT_ID}-tfstate"
terraform plan | grep -e "#"
# Firestore may only be initialized once
FIRESTORE_INIT="-var=firebase_init=false"
if [[ $(gcloud alpha firestore databases list --project="${PROJECT_ID}" --quiet | grep -c uid) == 0 ]]; then
FIRESTORE_INIT="-var=firebase_init=true"
fi
terraform apply ${FIRESTORE_INIT} --auto-approve
popd
In this section you successfully created the following via Terraform
- Firebase Base Apps
- GKE Cluster for backends and GCS buckets
- Ingress and other Service Accounts and Secrets on the Cluster
Follow Firebase setup Instructions
Checkout the backend repo and select the latest release version you'd like to deploy to match the frontends you just deployed. Deploy the needed indexes to firestore. Make sure the database import is completed first.
cd cloud-learning-platform
export PWD=$(pwd)
export GCP_PROJECT=$PROJECT_ID
echo "Your current GCP Project ID is: "$GCP_PROJECT
cd utils
PYTHONPATH=../common/src python firestore_indexing.py
cd ..
We will now run a series of skaffold
commands to build the necessary containers in cloud build and deploy them to the GKE cluster to power the backend services.
First connect to your GKE cluster that you've already provisioned. You can find the command here.
gcloud container clusters get-credentials $GCP_PROJECT-$REGION --region $REGION --project $GCP_PROJECT
kubectx and kubens are handy tools to easily switch between Kubernetes clusters and namespaces.
Return to the repo root. Make sure you have the version you desire checked out.
cd $PWD
echo "Your current GCP Project ID is: "$(git branch --show-current)
export GCP_PROJECT=$PROJECT_ID
export PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
echo "Your current GCP Project ID is: "$PROJECT_ID
export BACKEND_API=https://$PROJECT_ID-api.cloudpssolutions.com
# GIT_RELEASE=$(git describe --tags --abbrev=0 --exact-match)
GIT_SHA=$(git rev-parse HEAD)
Run the following to get Firebase API key (Web API key):
KEY_NAME=$(gcloud alpha services api-keys list --filter="displayName='Browser key (auto created by Firebase)'" --format="value(name)")
export FIREBASE_API_KEY=$(gcloud alpha services api-keys get-key-string $KEY_NAME --format="value(keyString)")
Set environment variables:
export IS_DEVELOPMENT=false
export IS_CLOUD_LOGGING_ENABLED=true
export RELEASE_VERSION=$CLP_VERSION
export SKAFFOLD_BUILD_CONCURRENCY=0
Deploy each set of services, one set at a time. This can take several tries due to transient build failures. These can take over 10 minutes to complete.
NOTE: Make sure
gcloud
is set to the proper project and your Kubeconfig is set to the appropriate cluster. Make sure your user account is also set as Application Default Credentials soskaffold
andhelm
have the appropriate access.
You can watch the logs of your builds in Cloud Build as well as streaming to your command line.
echo $GCP_PROJECT $PROJECT_ID $GIT_SHA $CLP_VERSiON
# Deploy backend microservices
skaffold run -p custom --default-repo=gcr.io/$PROJECT_ID -l commit=$GIT_SHA -m v3_backends --tag $CLP_VERSION
Eventually you should see that all the containers are built and skaffold
is starting to deploy resources.
You can also watch the pods deploy by running this in another terminal session:
kubectl get po
# or if you have `watch`
watch kubectl get po
Eventually you will see the deployments stabilize:
To save on cost it may be desirable to reduce GCP spend when the application is not being used or evaluated. Primarily this is achieved by turning down the GKE cluster and turning off the backend. Please not this pathway is only somewhat tested. You should test your user journeys each time you turn up the cluster.
- For each node pool in the console:
- Disable auto scaling, Click Save
- Set nodes = 0, Click Save
- Turn on Auto-scaling for both pools (min 1, max 8)
- Change number of nodes for both pools to 1-4 (autoscaler will even it out)
- Let all services turn on
Use kubectl get pods
to monitor the status of pods.
ContainerCreating
means it’s starting, Pending
meaning it is waiting for resources, i.e. GPU node