diff --git a/docs/install.md b/docs/install.md index 9c99cf07..1bb9af66 100644 --- a/docs/install.md +++ b/docs/install.md @@ -1,3 +1,8 @@ + + # Installing NIM Operator for Kubernetes using Helm ### Pre-requisites diff --git a/docs/nimcache.md b/docs/nimcache.md index 7237a715..f34cb36e 100644 --- a/docs/nimcache.md +++ b/docs/nimcache.md @@ -1,22 +1,31 @@ -# Caching NIM models -Follow these steps to cache NIM models into a persistent storage (PVC) + -### Pre-requisites +# Caching NIM Models -* NVIDIA GPU Operator have to be installed -* NVIDIA NIM Operator for K8s have to be installed -* Access to following NGC repositories required - - nvcr.io/nvstaging/cloud-native - - nvcr.io/nvidian/nim-llm-dev -* Local Path Provisioner for creating a Persistent Volume (PV) +Follow these steps to cache NIM models in a persistent volume. -### 1. Create a Namespace for running NIM services +## Prerequisites + +* NVIDIA GPU Operator is installed. +* NVIDIA NIM Operator is installed. +* You must have an active subscription to an NVIDIA AI Enterprise product or be an NVIDIA Developer Program + [member](https://build.nvidia.com/explore/discover?integrate_nim=true&developer_enroll=true&self_hosted_api=true&signin=true). + Access to the containers and models for NVIDIA NIM microservices is restricted. + +* A persistent volume provisioner is installed. + + The Local Path Provisioner from Rancher is acceptable for development on a single-node cluster. + +## 1. Create a Namespace for Running NIM Microservices ```sh kubectl create ns nim-service ``` -### 2. Create an Image Pull Secret for the NIM container +### 2. Create an Image Pull Secret for the NIM Container Replace with your NGC CLI API key. @@ -27,12 +36,13 @@ kubectl create secret -n nim-service docker-registry ngc-secret \ --docker-password= ``` -### 3. Create the `NIMCache` instance with auto-selection of models enabled +## 3. Create the NIM Cache Instance and Enable Model Auto-Detection + Update the `NIMCache` custom resource (CR) with appropriate values for model selection. These include `model.precision`, `model.engine`, `model.qosProfile`, `model.gpu.product` and `model.gpu.ids`. -With these, the NIM operator will extract supported profiles and use that for caching. +With these, the NIM Operator can extract the supported profiles and use that for caching. -Alternatively if `model.profiles` are specified, then that particular model profile will be downloaded. +Alternatively, if you specify `model.profiles`, then the model puller downloads and caches that particular model profile. ```yaml apiVersion: apps.nvidia.com/v1alpha1 @@ -73,23 +83,26 @@ spec: kubectl create -f nimcache.yaml -n nim-service ``` -### 5. Verify the progress of NIM model caching +### 5. Verify the Progress of NIM Model Caching + Verify that the NIM Operator has initiated the caching job and track status via the CR. ```sh kubectl get nimcache -n nim-service -o wide ``` -```console +```output NAME STATUS PVC AGE meta-llama3-8b-instruct ready meta-llama3-8b-instruct-pvc 2024-07-04T23:22:13Z ``` +Get the NIM cache so you can view the status: + ```sh kubectl get nimcache -n nim-service -o yaml ``` -```console +```output apiVersion: apps.nvidia.com/v1alpha1 kind: NIMCache metadata: @@ -164,4 +177,4 @@ status: tp: "2" pvc: meta-llama3-8b-instruct-pvc state: ready -``` \ No newline at end of file +``` diff --git a/docs/nimservice.md b/docs/nimservice.md index 593ff3a3..a930e6b1 100644 --- a/docs/nimservice.md +++ b/docs/nimservice.md @@ -1,13 +1,17 @@ + + # Create a NIM Service -### Pre-requisites +## Prerequisites -* Create a namespace e.g. `nim-service` -* Create a `NIMCache` instance in the namespace `nim-service` following the guide [here](https://gitlab-master.nvidia.com/dl/container-dev/k8s-nim-operator/-/blob/51e9727929b16982a2dba6d7fccbd0474f566bf8/docs/nimcache.md). +* A `NIMCache` instance in the namespace `nim-service`. -### 1. Create the CR for NIMService +## 1. Create the NIM Service Instance -nimservice.yaml: +Create a file, such as `nimservice.yaml`, with contents like the following example: ```yaml apiVersion: apps.nvidia.com/v1alpha1 @@ -39,42 +43,43 @@ spec: openaiPort: 8000 ``` +Apply the manifest: + ```sh kubectl create -f nimservice.yaml -n nim-service ``` -### 2. Check the status of NIMService deployment +### 2. Check the Status of NIM Service Deployment ```sh kubectl get nimservice -n nim-service ``` -```console -kubectl get nimservice -n nim-service -NAME STATUS AGE -meta-llama3-8b-instruct-latest ready 115m +```output +NAME STATUS AGE +meta-llama3-8b-instruct Ready 115m ``` ```sh kubectl get pods -n nim-service ``` -```console -NAME READY STATUS RESTARTS AGE -meta-llama3-8b-instruct-latest-db9d899fd-mfmq2 1/1 Running 0 108m -meta-llama3-8b-instruct-latest-job-xktnk 0/1 Completed 0 4m38s +```output +NAME READY STATUS RESTARTS AGE +meta-llama3-8b-instruct-db9d899fd-mfmq2 1/1 Running 0 108m +meta-llama3-8b-instruct-job-xktnk 0/1 Completed 0 4m38s ``` -### 3. Verify with a sample pod +### 3. Verify the Microservice is Running -test-pod.yaml: +Create a file, `verify-pod.yaml`, with contents like the following example: ```yaml --- apiVersion: v1 kind: Pod metadata: - name: test-streaming-chat + name: verify-streaming-chat spec: containers: - name: curl @@ -118,10 +123,14 @@ spec: restartPolicy: Never ``` +Apply the manifest: + ```sh kubectl create -f test-pod.yaml -n nim-service ``` +Confirm the verification pod ran to completion: + ```sh kubectl get pods -n nim-service ``` @@ -130,6 +139,6 @@ kubectl get pods -n nim-service NAME READY STATUS RESTARTS AGE meta-llama3-8b-instruct-latest-db9d899fd-mfmq2 1/1 Running 0 112m meta-llama3-8b-instruct-latest-job-xktnk 0/1 Completed 0 8m8s -test-streaming-chat 0/1 Completed 0 99m +verify-streaming-chat 0/1 Completed 0 99m ```