A typescript fork of the awesome rd-parse library by Dmitry Maevsky
- Typescript Support
- Helper function
LeftBinaryOperator
to construct binary operators with associativity - Return all the ignore capture groups or nodes as comments
- Example prettier parser plugin
A generic minimalist zero dependencies recursive-descent parser generator in Javascript.
You define your grammar directly in Javascript, in an EBNF fashion: see the examples. The parser produces an AST according to the specified grammar.
To witness the sheer power of this tool, please check out the ES2015 expression grammar: Javascript expression grammar, defined in Javascript, and parsed by Javascript !
- rd-parse is now an ES6 module. package.json is not yet compliant with the recently added Node's ESM support, but will be once it's stable. In the meantime use it with
node -r esm
in Node environment - Rule constructors are now purely functional independent exports, as is the Parser generator itself: check out the examples
- No more upfront tokenization. The parser is tokenizing the input as it proceeds, using a stack of lexer contexts (used to easily exclude chunks of input such as whitespace or comments)
Usage:
import Parser from '@ikerin/rd-parse';
import Grammar from 'rd-parse-jsexpr';
const parser = Parser(Grammar);
const ast = parser('[1,2,3].map(a => a*a).reduce((a,b) => a + b)');
console.log(JSON.stringify(ast, null, 2)); // Pretty print your AST
With rd-parse you can define your grammar directly in Javascript in a very expressive fashion, using constructs borrowed form EBNF. The package exports the elementary combinators that you can use to construct your grammar from ground up, starting from Lexer tokens and working your way up, applying EBNF rules.
rd-parse is using regexes to tokenize the input, so an elementary rule (token) would be just a string (matched exactly) or a regex.
IMPORTANT: the regex rules must start with ^
in order to match the input at the current parsing position!
The package exports a lexer helper Ignore
. If you wrap a rule with Ignore(regex, rule)
the parser will create a lexer context for the rule
being matched that ignores chunks of input matching the provided regex
(e.g. whitespace or comments). The lexer context is pushed on a stack inside the parsing state, so the the rule
's children rules may use different lexer contexts (i.e. in string interpolation).
All captures in regex rules are put on the semantic stack inside the parsing state, and can be used by higher order rules' reducers to build the AST.
The package exports the standard combinators (All, Any, Plus, Optional, Star)
: the production rules allowing you to build more complicated matching rules. This is best illustrated by some examples.
const StringToken = Any(
/^('[^'\\]*(?:\\.[^'\\]*)*')/, // single-quoted
/^("[^"\\]*(?:\\.[^"\\]*)*")/, // double-quoted
);
const Identifier = /^([a-zA-Z_$][a-zA-Z0-9_$]*)/;
const SumExpression = All(Identifier, '=', Identifier, '+', Identifier);
The SumExpression
rule will match inputs like a = b + c
.
All
combinator matches all argument rules in order.
Any
tries argument rules from left to right and reports a match once successful.
Plus
matches the argument rule one or more times.
Optional
matches the argument rule one or zero times.
Star
matches the argument rule zero or more times: Star = rule => Optional(Plus(rule))
.
If your grammar is recursive (which is often the case), you might need to wrap it in a Y-combinator, as shown in the examples. Y
is exported from the package as a matter of convenience.
Use the Node
helper to define how to build your AST. It has two arguments: the rule to match, and a reducer callback: see below.
Each regex matching rule will dump all capturing groups matches from its regex to a stack in the matched order.
If a rule is wrapped in a Node
, the parser will call the provided reducer passing an Array, containing everything that the matched rule has put onto the stack. The reducer returns an object to be put back onto the stack for the parent nodes to pick up later.
For example we can wrap the SumExpression
above in a Node
:
const SumExpression = Node(All(Identifier, '=', Identifier, '+', Identifier), ([result, left, right]) => ({
type: 'Assignment',
result,
left,
right,
}));
using the power of ES6 arrow functions, destructuring arguments, and shorthand notation.
The AST for the input 'a = b + c'
would be { type: 'Assignment', result: 'a', left: 'b', right: 'c' }
The ES2015 expression grammar illustrates many useful constructions.
Originally inspired by @oleganza's blog post.