Given a training set of Watson Assistant utterances mapped to intents with optional entities set, the user wants to test its performance by using a blind set.
Unlike k-fold process, no separate folds will be created. Only one workspace is going to be trained using all of the training set. After the testing, both the test output and the previous_test_out
are fed into createPrecisionCurve.py
for plotting curves.
Further, reports are generated for an intent metrics summary and a confusion matrix. These include additional summaries and visualizations that help determine the strength and weaknesses of the training set.
config.ini
(fill in your iam_apikey
, url
, and one of workspace_id
or environment_id
at minimum. (Older instances use a url like "https://gateway-wdc.watsonplatform.net/assistant/api")
[ASSISTANT CREDENTIALS]
iam_apikey = <wa iam apikey>
url = https://gateway-wdc.watsonplatform.net/assistant/api
version=2021-06-14
[DEFAULT]
mode = blind
workspace_id = 01234567-9ABC-DEF0-1234-56789ABCDEF0
;environment_id=
; Provide when comparing to a previous blind test result
;previous_blind_out = ./data/previous_blind_out.csv
; optional - defaults shown here
;output_directory = ./data
;test_input_file = ./data/input.csv
;blind_figure_title = "<igure name"
;test_output_path = ./data/blind-out.csv
;out_figure_path= ./data/blind.png
;keep_workspace_after_test = no
test_input_file
- Blind test set. Defaults ./data/input.csv
. Should have columns utterance
and golden intent
previous_blind_out
- (Optional) Test output from previous blind test result.
Note: For Watson Assistant v2, the intent names returned are intent IDs rather than intent names. These IDs can be discovered in the source code of the assistant JSON.
Partial credit scoring may be configured when intents are intentionally similar or overlapping, especially in the case of multi-intent statements.