-
Notifications
You must be signed in to change notification settings - Fork 2
6. Introduction to logical cost in DBSim
DBSim uses logical cost to determine the best plan when the optimizer generates multiple plans after transformation based on some rules. In such case, the plan with the lowest logical cost will be the best plan.
-
In DBSim, the query plan nodes (i.e., the operators) are grouped into two categories: predicate and relational operators. Predicate operators are the operators in predicates, like
And
,Or
,GtOp
('>'),LtOp
('<'), etc, while relational operators are those directly manipulating relations, likeSelectionOp
,JoinOp
,ProjectionOp
, etc. -
Each plan node / operator (including predicate and relational operators) has an attribute
cost_factor
, used to be a multiplication factor of #processed_rows when computing logical cost.cost_factor
is 1.0 by default, i.e., the logical cost of each plan node is by default the number of its processed rows. -
The initial value of
cost_factor
can be set as needed, normally depending on the computation workload of the operator. DBSim roughly considers the workload of one logical operation (like AND, OR, <, ==, >) or one basic arithmetic operation (+, -, *, /) to be 1, then the workload of N-dim vector dot product is about 2N (or more accurate, 2N-1, since there are N multiplications and (N-1) additions). So the initial value ofcost_factor
forAnd
operator is 1.0,that ofToOp
(see the section "Extension examples") is 2N or 2N-1 when the metric is dot product. -
When parsing a query into AST, the initial values of
cost_factor
will be set for each AST node. To compute the logical cost, those initial values need to be further refined (by calling methodLogicalCost.refineCostFactors
). The refining process is as follows:-
For each predicate operator,
- its refined cost_factor = its initial cost_factor + sum(refined cost_factors of all its children)
- The computation above is executed recursively until the leaf nodes are reached which have no children. For a leaf node, the refined cost_factor is its initial cost_factor.
-
For each relational operator,
- its refined cost_factor = its initial cost_factor + the refined cost_factor of the root node of its predicate
-
-
Based on the refined cost_factors, the logical cost of a query plan will be computed as follows:
-
For each relational operator,
- its logical cost = the number of processed rows * its refined cost_factor
- for operator with single child, like
SelectionOp
, the number of processed rows is that of its input rows; while for operator with two children, likeJoinOp
orUnionAllOp
, the number of processed rows is determined case by case. For example,- for
JoinOp
, we roughly set the number of processed rows = (#input_rows from left child) * (#input_rows from right child) - for
UnionAllOp
, the number of processed rows = (#input_rows from left child) + (#input_rows from right child)
- for
-
For the whole query plan,
- total logical cost = sum(logical costs of all the relational operators)
-
-
Note:
- Computing logical cost does not take the physical implementation into consideration. For example, whether a join is a hash join or a nested-loop join does not affect the logical cost. And that is why we name it "logical cost".