diff --git a/docs/develop/rust/wasinn/llm-inference.md b/docs/develop/rust/wasinn/llm-inference.md index b7cf3739..6871f89e 100644 --- a/docs/develop/rust/wasinn/llm-inference.md +++ b/docs/develop/rust/wasinn/llm-inference.md @@ -96,9 +96,8 @@ You can use environment variables to configure the model execution. | Option |Default |Function | | -------|-----------|----- | -| | -LLAMA_LOG| 0 |The backend will print diagnostic information when this value is set to 1| -|LLAMA_N_CTX |512| The context length is the max number of tokens in the entire conversation| +| LLAMA_LOG | 0 |The backend will print diagnostic information when this value is set to 1| +|LLAMA_N_CTX |512| The context length is the max number of tokens in the entire conversation| |LLAMA_N_PREDICT |512|The number of tokens to generate in each response from the model| For example, the following command specifies a context length of 4k tokens, which is standard for llama2, and the max number of tokens in each response to be 1k. It also tells WasmEdge to print out logs and statistics of the model at runtime.