Skip to content

Tutorial 6.4: Expressions

Federico Tomassetti edited this page Feb 20, 2014 · 3 revisions

Classical Parsing Expression Grammar approach to define expressions with infix/postfix/prefix operators like (1 + 2 * 3) / -4 offers to use rules like this:

Value      <- atomicOperand | '(' Expression ')'
Product    <- Expression (('*' | '/') Expression )*
Sum        <- Expression (('+' | '-') Expression )*
Expression <- Product | Sum | Value

In simple case described in the pseudo-code above it looks pretty simple. But for real programming languages with numbers of various operators the grammar could be much more complicated. Appropriate rules should be implemented in accordance with operator precedence and associativity. Moreover usually compiler needs to have Nodes in form of Binary Tree. So additional step to reformat resulting tree to binary form will be required.

To reduce this complexity Papa Carlo provide's builder of operator-precedence parsing rule with easy to use API. And the output node result of this rule form Binary Tree by default.

From the Calculator example:

...

    mainRule("expression") {
      val rule =
        expression(branch("operand", recover(number, "operand required")))

      group(rule, "(", ")")
      postfix(rule, "%", 1)
      prefix(rule, "+", 2)
      prefix(rule, "-", 2)
      infix(rule, "*", 3)
      infix(rule, "/", 3, rightAssociativity = true)
      infix(rule, "+", 4)
      infix(rule, "-", 4)

      rule
    }
...

Here the .expression(tag: String, operand: Rule) method defines expression rule parser. tag is optional parameter that is a "result" by default. It's meaning is the same to tag parameter in .branch() method. operand is atomic operand of the expression. In this example of the math expression parser it is a numeric literal.

Four functions are used to configure the parser

Operator constructor Expression example Description
group(expression, open, close) ( x + y ) * 2 Defines grouping operators.
infix(expression, operator, precedence) x + y + z Left associative infix binary operator with specific operator's precedence.
infix(expression, operator, precedence, rightAssociativity = true) x = y = z Right associative infix binary operator with specific operator's precedence.
prefix(expression, operator, precedence) ( - x) + y Prefix unary operator with specific operator's precedence.
postfix(expression, operator, precedence) x = i ++ Postfix unary operator with specific operator's precedence.

Order of the constructor applications does not make sense. They can be applied in any order. So to define expression grammar of the programming language developer can directly translate operator precedence tables like this: Python operator precedence table.

Internally parser uses Pratt algorithm. So one who are familiar with this parsing technique may implement his/her own extensions.

From the Postfix operator constructor source code:

...
  def postfix(rule: ExpressionRule, operator: String, precedence: Int) {
    rule.parselet(operator)
      .leftBindingPower(Int.MaxValue - precedence * 10)
      .leftDenotation {
        (expression, left, operatorReference) =>
          val node = new Node(operator, operatorReference, operatorReference)
          node.branches += "operand" -> List(left)
          node
      }
  }
...

Here is a brief low-level API description:

  • .parselet() method defines a new "token" in term's of Pratt's algorithm for the specific operator.
  • .leftBindingPower() defines ldp parameter of the token.
  • .leftDenotation() defines led handler of the token.
  • .nullDenotation() defines nud handler of the token.
  • expression.parseRight(rdp) to apply parser in the body of the led or nud handler to the right part of the expression with specific rdp right-binding power.
  • expression.consume(tokenKind) to force parser read specific token in the current position.