diff --git a/docs/modelserving/v1beta1/transformer/collocation/README.md b/docs/modelserving/v1beta1/transformer/collocation/README.md new file mode 100644 index 000000000..74245ea8b --- /dev/null +++ b/docs/modelserving/v1beta1/transformer/collocation/README.md @@ -0,0 +1,116 @@ +# Collocate transformer and predictor in same pod + +KServe by default deploys the Transformer and Predictor as separate services, allowing you to deploy them on different devices and scale them independently. +
Nevertheless, there are certain situations where you might prefer to collocate the transformer and predictor within the same pod. Here are a few scenarios: + +1. If your transformer is tightly coupled with the predictor and you want to perform canary deployment together. +2. If you want to reduce sidecar resources. +3. If you want to reduce networking latency. + +## Before you begin + +1. Your ~/.kube/config should point to a cluster with [KServe installed](../../../../get_started/README.md#install-the-kserve-quickstart-environment). +2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/). +3. You can find the [code samples](https://github.com/kserve/kserve/tree/master/docs/samples/v1beta1/transformer/collocation) on kserve repository. + +## Deploy the InferenceService + +Since, the predictor and the transformer are in the same pod, they need to listen on different ports to avoid conflict. `Transformer` is configured to listen on port 8000 and 8081 +while, `Predictor` listens on port 8080 and 8082. `Transformer` calls `Predictor` on port 8082 via local socket. +Deploy the `Inferenceservice` using the below command. + +```bash +cat < POST /v2/models/custom-model/infer HTTP/1.1 + > Host: custom-transformer-collocation.default.example.com + > User-Agent: curl/7.85.0 + > Accept: */* + > Content-Type: application/json + > Content-Length: 105396 + > + * We are completely uploaded and fine + * Mark bundle as not supporting multiuse + < HTTP/1.1 200 OK + < content-length: 298 + < content-type: application/json + < date: Thu, 04 May 2023 10:35:30 GMT + < server: istio-envoy + < x-envoy-upstream-service-time: 1273 + < + * Connection #0 to host localhost left intact + {"model_name":"custom-model","model_version":null,"id":"d685805f-a310-4690-9c71-a2dc38085d6f","parameters":null,"outputs":[{"name":"output-0","shape":[1,5],"datatype":"FP32","parameters":null,"data":[14.975618362426758,14.036808967590332,13.966032028198242,12.252279281616211,12.086268424987793]}]} + ``` diff --git a/mkdocs.yml b/mkdocs.yml index 8eb6c71a0..51bb8bce0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -47,6 +47,7 @@ nav: - Transformers: - Feast: modelserving/v1beta1/transformer/feast/README.md - How to write a custom transformer: modelserving/v1beta1/transformer/torchserve_image_transformer/README.md + - Collocate transformer and predictor: modelserving/v1beta1/transformer/collocation/README.md - Inference Graph: - Concept: modelserving/inference_graph/README.md - Image classification inference graph: modelserving/inference_graph/image_pipeline/README.md