Cloud Pak for Data is an end to end platform that helps organizations in their journey to AI. It enables data engineers, data stewards, data scientists, and business analysts to collaborate using an integrated multiple-cloud platform. Cloud Pak for Data uses IBM’s deep analytics portfolio to help organizations meet data and analytics challenges. The required building blocks (collect, organize, analyze, infuse) for information architecture are available using Cloud Pak for Data on Azure.
Cloud Pak for Data uses cloud native services and features including VNets, VPCs, Availability Zones, security groups, Managed Disks, and Load Balancers to build a highly available, reliable, and scalable cloud platform.
This deployment guide provides step-by-step instructions for deploying IBM Cloud Pak for Data on a Red Hat OpenShift Container Platform 4.10 cluster on Amazon Web Services(AWS) and Azure.
This reference deployment provides Terraform scripts to deploy Cloud Pak for Data on a new Red Hat OpenShift Container Platform 4.10 cluster on AWS and Azure. This cluster includes:
- A Red Hat OpenShift Container Platform cluster created in a new or existing VPC on Red Hat CoreOS (RHCOS) instances, using the Red Hat OpenShift Installer Provisioned Infrastructure.
- A highly available storage infrastructure with Portworx or OpenShift Data Foundation (ODF). You also have the option to select Network File System(NFS) for Azure, or Elastic File System(EFS) and Elastic Block Store(EBS) on AWS.
- Scalable OpenShift compute nodes running Cloud Pak for Data services. See Services for the services that are enabled in this deployment.
The deployment module includes configuration parameters that you can customize. See AWS and Azure deployment topology for more details. Some of these parameters, such as instance type and count, will affect the cost of deployment. For cost estimates, see the pricing page for each AWS and Azure service you will be using. Prices are subject to change. This deployment requires a Red Hat OpenShift subscription and a Cloud Pak for Data subscription. You can obtain a 60-day trial license. See the prerequisites section.
This deployment requires a Red Hat subscription. You’ll need to provide your OpenShift Installer Provisioned Infrastructure pull secret.
If you don’t have a Red Hat account, you can register on the Red Hat website. (Note that registration may require a non-personal email address). To procure a 60-day evaluation license for OpenShift, follow the instructions at Evaluate Red Hat OpenShift Container Platform. The OpenShift pull secret should be downloaded and the file location be made available to Terraform script parameters.
You will need to have a Cloud Pak for Data entitlement API key to download images from the IBM entitled Cloud Pak registry. If you don't have a paid entitlement, you can create a 60 day trial subscription key. Note: After 60 days contact IBM Cloud Pak for Data sales.
You can select one of the two container storages while installing this Quickstart.
Note: You also have the option to select NFS for Azure or EFS and EBS for AWS in which case there is no additional storage subscription required.
When you select Portworx as the persistent storage layer, you will need to specify the install spec from your Portworx account. You can generate a new spec using the Spec Generator. Note that the Portworx trial edition expires in 30 days after which you need to upgrade to an Enterprise Edition.
Cloud Pak for Data supports an entitled Portworx instance which you can install manually once your cluster is provisioned.
The Red Hat OpenShift Data Foundation license is linked as a separate entitlement to your RedHat subscription. If you do not have a separate subscription for ODF, a 60-day trial version is installed. Note: OpenShift Container Storage(OCS) is now OpenShift Data Foundation starting from version 4.9.
See AWS topology for more details for AWS.
See Azure topology for more details for Azure.
Please refer to Cloud Pak for Data System Requirements for detailed information on the System requirements for installing Cloud Pak for Data platform and services.
You need to have Terraform installed on your client.
See AWS deployment documentation for AWS deployment.
See Azure deployment documentation for Azure deployment.
The number of compute nodes in the cluster is controlled by MachineSets.
To scale up or scale down the cluster:
- Find the MachineSet for the node in the region that you want to scale.
oc get machineset -n openshift-machine-api
- To manually increase or decrease the nodes in a zone, set the replicas to the desired count:
oc scale --replicas=<number of nodes for the machineset> machineset <machineset> -n openshift-machine-api
You can browse the various services that are available for use by navigating to the services catalog page in Cloud Pak for Data.
As part of the deployment, the following services can be enabled.
- Watson Studio
- Watson Knowledge Catalog
- Watson Machine Learning
- Data Virtualization
- Watson OpenScale
- Analytics Engine Powered by Apache Spark
- Cognos Dashboards
- Db2 Warehouse
- DataStage Enterprise Plus
- Cognos Analytics
- Db2 Advanced Edition
- SPSS Modeler
- Planning Analytics
- Watson Discovery
- Openpages
- Decision Optimization
- BigSQL
- Match 360
- Watson Assitant (on AWS only)
To get information on various other services that are available, you can visit Cloud Pak for Data Service Catalog.
- After the installation is complete, activate the license:
PX_POD=$(oc get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
oc exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl license activate <activation id>
- For more information see Portworx Licensing.
When nodes are rebooted for the first time after the cluster is created, the Certificate Signing Requests for the nodes need to be approved by cluster administrator. Until this is done the oc client will not function. The CSRs can be approved by using the kube config file created at the time of install.
-
change directory to directory where you executed terraform.
-
cd installer-files
-
Run this to get the list of CSRs needing approval
$ oc --kubeconfig=auth/config get csr
-
Run this to approve all CSRs in a single step
oc --kubeconfig=auth/config get csr -o name | xargs oc --kubeconfig=auth/config adm certificate approve