Skip to content

Commit

Permalink
Add example for collocation of transformer and predictor (#257)
Browse files Browse the repository at this point in the history
* Add collocate transformer and predictor example

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Resolve comments

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

---------

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
  • Loading branch information
sivanantha321 authored Jul 1, 2023
1 parent 19e8656 commit 53948bd
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 0 deletions.
116 changes: 116 additions & 0 deletions docs/modelserving/v1beta1/transformer/collocation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Collocate transformer and predictor in same pod

KServe by default deploys the Transformer and Predictor as separate services, allowing you to deploy them on different devices and scale them independently.
<br/>Nevertheless, there are certain situations where you might prefer to collocate the transformer and predictor within the same pod. Here are a few scenarios:

1. If your transformer is tightly coupled with the predictor and you want to perform canary deployment together.
2. If you want to reduce sidecar resources.
3. If you want to reduce networking latency.

## Before you begin

1. Your ~/.kube/config should point to a cluster with [KServe installed](../../../../get_started/README.md#install-the-kserve-quickstart-environment).
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
3. You can find the [code samples](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/transformer/collocation) on kserve repository.

## Deploy the InferenceService

Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8000 and 8081
while, `Predictor` listens on port 8080 and 8082. `Transformer` calls `Predictor` on port 8082 via local socket.
Deploy the `Inferenceservice` using the below command.

```bash
cat <<EOF | kubectl apply -f -
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: custom-transformer-collocation
spec:
predictor:
containers:
- name: kserve-container
image: kserve/custom-model-grpc:latest
args:
- --model_name=custom-model
- --grpc_port=8082
- --http_port=8080
- image: kserve/image-transformer:latest
name: transformer-container # Do not change the container name
args:
- --model_name=custom-model
- --protocol=grpc-v2
- --http_port=8000
- --grpc_port=8081
- --predictor_host=localhost:8082
ports:
- containerPort: 8000
protocol: TCP
EOF
```
!!! success "Expected output"
```{ .bash .no-copy }
$ inferenceservice.serving.kserve.io/custom-transformer-collocation created
```

!!! Warning
Always use the transformer container name as `transformer-container`. Otherwise, the model volume is not mounted to the transformer
container which may result in an error.

!!! Note
Currently, The collocation support is limited to the custom container spec for kserve model container.

## Check InferenceService status
```bash
kubectl get isvc custom-transformer-collocation
```
!!! success "Expected output"
```{ .bash .no-copy }
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
custom-transformer-collocation http://custom-transformer-collocation.default.example.com True 100 custom-transformer-collocation-predictor-00001 133m
```

!!! Note
If your DNS contains `svc.cluster.local`, then `Inferenceservice` is not exposed through Ingress. you need to [configure DNS](https://knative.dev/docs/install/yaml-install/serving/install-serving-with-yaml/#configure-dns)
or [use a custom domain](https://knative.dev/docs/serving/using-a-custom-domain/) in order to expose the `isvc`.

## Run a prediction
Prepare the [inputs](https://github.com/kserve/kserve/blob/master/docs/samples/v1beta1/transformer/collocation/input.json) for the inference request. Copy the following Json into a file named `input.json`.

Now, [determine the ingress IP and ports](../../../../get_started/first_isvc.md#4-determine-the-ingress-ip-and-ports
) and set `INGRESS_HOST` and `INGRESS_PORT`

```bash
SERVICE_NAME=custom-transformer-collocation
MODEL_NAME=custom-model
INPUT_PATH=@./input.json
SERVICE_HOSTNAME=$(kubectl get inferenceservice $SERVICE_NAME -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```
You can use `curl` to send the inference request as:
```bash
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" -d $INPUT_PATH http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/$MODEL_NAME/infer
```

!!! success "Expected output"
```{ .bash .no-copy }
* Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> POST /v2/models/custom-model/infer HTTP/1.1
> Host: custom-transformer-collocation.default.example.com
> User-Agent: curl/7.85.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 105396
>
* We are completely uploaded and fine
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< content-length: 298
< content-type: application/json
< date: Thu, 04 May 2023 10:35:30 GMT
< server: istio-envoy
< x-envoy-upstream-service-time: 1273
<
* Connection #0 to host localhost left intact
{"model_name":"custom-model","model_version":null,"id":"d685805f-a310-4690-9c71-a2dc38085d6f","parameters":null,"outputs":[{"name":"output-0","shape":[1,5],"datatype":"FP32","parameters":null,"data":[14.975618362426758,14.036808967590332,13.966032028198242,12.252279281616211,12.086268424987793]}]}
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ nav:
- Transformers:
- Feast: modelserving/v1beta1/transformer/feast/README.md
- How to write a custom transformer: modelserving/v1beta1/transformer/torchserve_image_transformer/README.md
- Collocate transformer and predictor: modelserving/v1beta1/transformer/collocation/README.md
- Inference Graph:
- Concept: modelserving/inference_graph/README.md
- Image classification inference graph: modelserving/inference_graph/image_pipeline/README.md
Expand Down

0 comments on commit 53948bd

Please sign in to comment.