NOTE: additional documentation here
Warning Note that this repository is not actively maintained anymore. The functionality around the wasmcloud based inference server has moved to wamli.
This repository provides a wasmCloud capability provider and actors to perform inference using machine learning models for ONNX and Tensorflow.
We recommand using bindle version v0.7.1 The latest version in github HEAD (as of March 2022) has not been released, and includes signature checks, and are not compatible with the scripts and models in this repo.
Make sure your Docker install has Compose v2.
Download a wasmcloud host binary release for your platform from
Releases
and unpack it. The path to the download folder should be set as
WASMCLOUD_HOST_HOME
in deploy/env
From the top-level directory build with make
. This should complete
without errors.
Models (in bindle/models
) must be loaded into the bindle server.
If you are using your own model, you will need to create a "bindle
invoice", a .toml
file listing the bindle artifacts. Each artifact
has a sha256 hash and file size of each artifact. See the
existing toml files in bindle/models
for examples.
Update paths in file deploy/env
to match your development environment.
Be sure to set BINDLE and BINDLE_SERVER in env
to the paths to the bindle cli
and bindle server executables, respectively. If they are in your $PATH,
you can just set these to bindle
and bindle-server
. If you built
bindle from git, use the 0.7.1 tag, run cargo build
, and set
BINDLE_HOME to the path to the git repo.
The script deploy/run.sh
contains commands to run everything. In the
deploy
folder, run run.sh
to see a list of available subcommands.
Start the bindle server and load the models.
./run.sh bindle-start
./run.sh bindle-create
Start the local registry server, nats server, wasmcloud host,
actors, and providers. If this is your first time running running this
app, add --console
to the end of the following command to open a new
terminal window with the host logs. The logs may be useful for
diagnosing any problems.
./run.sh all
# or, to open a $TERMINAL window with host logs
./run.sh all --console
After a successful startup the washboard should look similar to the following screenshot:
If everything started correctly, try sending an image to be classified:
(try any of the images in images/
, or try one of your own!
curl -T images/cat.jpg http://localhost:8078/mobilenetv27/matches | jq
To stop the host and providers,
/run.sh wipe
The above command stops everything except the bindle server.
To stop the bindle server,
./run.sh bindle-stop
Once the application is up and running, start to issue requests. Currently, the repository comprises the following pre-configured models:
- identity of ONNX format
- plus3 of Tensorflow format
- mobilenet of ONNX format
- squeezenet of ONNX format
Apart from the underlying inference engine, e.g. ONNX vs. Tensorflow, the pre-configured models differ in a further aspect: concerning the trivial models, one may request processing upon arbitrary shapes of one-dimensional data, [1, n]
. Mobilenet and Squeezenet, however, have more requirements regarding their respective input tensor. To fulfill these, the respective input tensor of an arbitrary image can be preprocessed before being routed to the inference engine.
The application provides three endpoints. The first endpoint routes the input tensor to the related inference engine without any pre-processing. The second endpoint pre-processes the input tensor and routes it to the related inference engine thereafter. The third performs a pre-processing before the prediction step and a post-processinging afterwards.
0.0.0.0:<port>/<model>
, e.g.0.0.0.0:7078/identity
0.0.0.0:<port>/<model>/preprocess
, e.g.0.0.0.0:7078/squeezenetv117/preprocess
0.0.0.0:<port>/<model>/matches
, e.g.0.0.0.0:7078/squeezenetv117/matches
To trigger a request against the identity model, type the following:
curl -v POST 0.0.0.0:8078/identity -d '{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}'
The response should comprise HTTP/1.1 200 OK
as well as {"result":"Success","tensor":{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}}
The following happens:
- The http POST sends a request for a model with name "challenger", index
0
and somedata
. data
is vector[1.0f32, 2.0, 3.0, 4.0]
converted to a vector of bytes.- A response is computed. The result is sent back.
- The
data
in the request equalsdata
in the response because the pre-loaded model "challenger" is one that yields as output what it got as input.
To trigger a request against the plus3 model, type the following:
curl -v POST 0.0.0.0:8078/plus3 -d '{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,63,0,0,0,64,0,0,64,64,0,0,128,64]}'
The response is
{"result":"Success","tensor":{"dimensions":[1,4],"valueTypes":["ValueF32"],"flags":0,"data":[0,0,128,64,0,0,160,64,0,0,192,64,0,0,224,64]}}
Note that in contrast to the identity model, the answer from plus3 is not at all identical to the request. Converting the vector of bytes [0,0,128,64,0,0,160,64,0,0,192,64,0,0,224,64]
back to a vector of f32
yields [4.0, 5.0, 6.0, 7.0]
. This was expected: each element from the input is incremented by three [1.0, 2.0, 3.0, 4.0]
→ [4.0, 5.0, 6.0, 7.0]
, hence the name of the model: plus3.
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/mobilenetv27/preprocess --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
Note that the output tensor is of shape [1,1000]
and needs to be post-processed by an evaluation of the softmax over the outputs. In case the softmax shall be evaluated as well use the third endpoint, for example like the following:
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/mobilenetv27/matches --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/squeezenetv117/preprocess --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
Note that the output tensor is of shape [1,1000]
and needs to be post-processed where the post-processing is currently not part of the application. Or, including pos-processing as follows:
# in order for the relative path to match call from directory 'deploy'
curl -v POST 0.0.0.0:8078/squeezenetv117/matches --data-binary @../providers/mlinference/tests/testdata/images/n04350905.jpg
The answer should comprise
[{"label":"n02883205 bow tie, bow-tie, bowtie","probability":0.16806115},{"label":"n04350905 suit, suit of clothes","probability":0.14194612},{"label":"n03763968 military uniform","probability":0.11412828},{"label":"n02669723 academic gown, academic robe, judge's robe","probability":0.09906072},{"label":"n03787032 mortarboard","probability":0.09620707}]
The capability provider assumes a bindle to comprise two parcels where each parcel is assigned one of the following two groups:
- model
- metadata
The first, model
, is assumed to comprise model data, e.g. an ONNX model. The second, metadata
, is currently assumed to be json containing the metadata of the model. In case you create new bindles, make sure to assign these two groups.
The capability provider uses the amazing inference toolkit tract and currently supports the following inference engines
Concerning ONNX, see tract's documentation for a detailed discussion of ONNX format coverage.
Concerning Tensorflow, only TensorFlow 1.x is supported, not Tensorflow 2. However, models of format Tensorflow 2 may be converted to Tensorflow 1.x. For a more detailled discussion, see the following resources:
https://www.tensorflow.org/guide/migrate/tf1_vs_tf2
https://stackoverflow.com/questions/59112527/primer-on-tensorflow-and-keras-the-past-tf1-the-present-tf2#:~:text=In%20terms%20of%20the%20behavior,full%20list%20of%20data%20types.
Currently, there is no support of any accelerators like GPUs or TPUs. On the one hand, there is a range of coral devices like the Dev board supporting Tensorflow for TPU based inference. However, they only support the Tensorflow Lite derivative. For more information see Coral's Edge TPU inferencing overview.