Skip to content

Commit

Permalink
Merge pull request #82 from ibm-client-engineering/ross-updates
Browse files Browse the repository at this point in the history
Ross updates
  • Loading branch information
adamhayden-ibm authored Apr 11, 2024
2 parents 230ba8b + a505df8 commit fb00531
Showing 1 changed file with 65 additions and 0 deletions.
65 changes: 65 additions & 0 deletions flight-logs/2024-04-12-cocreate.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Log 19 🛫
description: Flight Log of Co-Creation Activities
slug: flight-log-19
tags: [log]
---

## Objective
Deploy watsonx.ai on self-managed AWS infrastructure for customer software evaluation

```mermaid
flowchart LR
A(Deploy bootnode) --> B(Deploy infrastructure)
B -->C(Deploy OCP)
subgraph "You are here"
D(Prepare CP4D & watsonx ai cartdridge)
end
C -->D
D -->E(Install CP4D)
E -->F(Deploy watsonx.ai)
```


## Milestones
1. Deploy and configuration of boot node to establish a beach-head into the customer AWS environment
- Complete
2. Deploy OCP using the documented UPI installation steps
- Complete
3. Install Cloud Pak for Data
- In Progress
4. Deploy and configure watsonx.ai on self-managed AWS infrastructure on ref environment and document
- In Progress

### Summary
- Awaiting entitlement key approval on customer side

## Decisions and Action Items (DAI)
- Software evaluation awaiting customer's approval process. This blocks our ability to download software from cp.icr.io
- Customer to provide by EOD Monday
- Worker nodes shutdown until approval comes through
- Drafted and sent instructions for the customer to resize the worker node disks for when the cluster is brought back online
- Drafted and sent instructions for the customer to order a GPU Node
- GPU node to be added to the cluster and then cordoned, drained, and shutdown

## Lessons Learned
- Preparation for Cloud Pak for Data on OpenShift sizing needed to be adjusted to reflect an under-provisioning of CPU resources
- watsonx.ai service requires larger local disks on worker nodes (500Gb)
- The GPU node required for watsonx.ai seems to be a limited resource

## Next Steps
- License and configure Cloud Pak for Data
- Cloud Pak Considerations
- Security scans needed on container images
- Customer requires on-prem, offline install
- Customer uses their own container registry that might introduce extra effort or compatability issues
- Version compatibility with OpenShift (e.g. 4.10 required and customer has 4.11)
- Supported storage not available
- Multiple cloudpaks on the same cluster
- custom connections to data sources not supported OOTB
- AWS-specific: IAM users required for install/deploy and are not allowed
- OpenShift specific: CoreOS requirement for control nodes
- Automatic updating of Cloud Pak, this can interrupt engagements (solution is to always remove update polling from operators)
- Resize local disks for worker nodes
- Customer to order a GPU node and attach it to the cluster
- Deploy watsonx.ai

0 comments on commit fb00531

Please sign in to comment.