KernelShap
redundant/incorrect API for OHE of the categorical features - grouping vs aggregation by summation
#879
Labels
The are multiple ways to deal with one hot encoding for the categorical features when using the
KernelShap
explainer.The second method is to use grouping on the one-hot-encoding representation. This approach can group multiple columns and treat them as one. For example, if we have a categorical column
fc
with 3 categories, its OHE representation will results in adding 3 columnsfc_1
,fc_2
,fc_3
. Without groupingKernelShap
will treat eachfc_i
as an individual player/feature and compute 3 values instead of one. With grouping, we can tellKernelShap
to treat all 3 columns as one[fc_1, fc_2, fc_3]
, basically treating them as a single player. Note that the Shap values should match the ones from the first approach (i.e., incorporating the preprocessor).alibi
exposes another method to compute the Shap values, using aggregation by summation. Following the same example from the second bullet point, given the OHE encoding representation as input,KernelShap
computes the Shap values for eachfc_i
and then aggregates them in a single value by summing them up (i.e.,fc = fc_1 + fc_2 + fc_3
). Unfortunately, this method is not correct since the results obtained by summation won't converge to the true Shap values as for the first two bullet points. This is probably an heuristic borrowed fromTreeShap
which cannot use the first two approaches. That being said, we should consider removing this approach since it is redundant and incorrect - can be achieved via bullet 2 in the correct way.Furthemore, the
cat_vars_start_idx
andcat_vars_enc_dim
which do aggregation forKernelShap
andTreeShap
are parameters in theexplain
method. We should consider moving those parameters into thefit
or__init__
method for the following reasons:KernelShap
we would have some symmetry betweenTreeShap
andKernelShap
in terms of dealing with categorical features (forKernelShap
,groups
andgroup_names
are arguments in thefit
method)._build_explanation
method.__init__
) or the explainer should be refitted.The text was updated successfully, but these errors were encountered: