Skip to content

6. Introduction to logical cost in DBSim

wyfunique edited this page Apr 22, 2022 · 5 revisions

Overview

DBSim uses logical cost to determine the best plan when the optimizer generates multiple plans after transformation based on some rules. In such case, the plan with the lowest logical cost will be the best plan.

Logical cost model and computing formula

  1. In DBSim, the query plan nodes (i.e., the operators) are grouped into two categories: predicate and relational operators. Predicate operators are the operators in predicates, like And, Or, GtOp('>'), LtOp('<'), etc, while relational operators are those directly manipulating relations, like SelectionOp, JoinOp, ProjectionOp, etc.

  2. Each plan node / operator (including predicate and relational operators) has an attribute cost_factor, used to be a multiplication factor of #processed_rows when computing logical cost. cost_factor is 1.0 by default, i.e., the logical cost of each plan node is by default the number of its processed rows.

  3. The initial value of cost_factor can be set as needed, normally depending on the computation workload of the operator. DBSim roughly considers the workload of one logical operation (like AND, OR, <, ==, >) or one basic arithmetic operation (+, -, *, /) to be 1, then the workload of N-dim vector dot product is about 2N (or more accurate, 2N-1, since there are N multiplications and (N-1) additions). So the initial value of cost_factor for And operator is 1.0,that of ToOp(see the section "Extension examples") is 2N or 2N-1 when the metric is dot product.

  4. When parsing a query into AST, the initial values of cost_factor will be set for each AST node. To compute the logical cost, those initial values need to be further refined (by calling method LogicalCost.refineCostFactors). The refining process is as follows:

    • For each predicate operator,

      • its refined cost_factor = its initial cost_factor + sum(refined cost_factors of all its children)
      • The computation above is executed recursively until the leaf nodes are reached which have no children. For a leaf node, the refined cost_factor is its initial cost_factor.
    • For each relational operator,

      • its refined cost_factor = its initial cost_factor + the refined cost_factor of the root node of its predicate
  5. Based on the refined cost_factors, the logical cost of a query plan will be computed as follows:

    • For each relational operator,

      • its logical cost = the number of processed rows * its refined cost_factor
      • for operator with single child, like SelectionOp, the number of processed rows is that of its input rows; while for operator with two children, like JoinOp or UnionAllOp, the number of processed rows is determined case by case. For example,
        • for JoinOp , we roughly set the number of processed rows = (#input_rows from left child) * (#input_rows from right child)
        • for UnionAllOp, the number of processed rows = (#input_rows from left child) + (#input_rows from right child)
    • For the whole query plan,

      • total logical cost = sum(logical costs of all the relational operators)
  6. Note:

  • Computing logical cost does not take the physical implementation into consideration. For example, whether a join is a hash join or a nested-loop join does not affect the logical cost. And that is why we name it "logical cost".
Clone this wiki locally