Mage is a tool for performing text analysis. It does so by generating a lexer, parser and parse tree for you. Whether it is a piece of programming code or some tabular data in a fringe format, Mage has got you covered!
- π Full support for Python typings. Avoid runtime errors while building your language!
- β Add your own languages through the use of a powerful template engine!
π Mage is written in itself. Check out the generated code of part of our Python generator!
Here is the status of the various languages supported by Mage:
Python
Name | Description | Status |
---|---|---|
CST | Create a parse tree from a grammar | β |
AST | Create an AST that is derived from a CST | β³ |
Lexer | Create a fully functioning lexer from a grammar | π§ |
Parser | Create a fully functioning parser from a grammmar | β³ |
Rust
Name | Description | Status |
---|---|---|
CST | Create a parse tree from a grammar | β³ |
AST | Create an AST that is derived from a CST | β³ |
Lexer | Create a fully functioning lexer from a grammar | β³ |
Parser | Create a fully functioning parser from a grammmar | β³ |
C++
Name | Description | Status |
---|---|---|
CST | Create a parse tree from a grammar | β³ |
AST | Create an AST that is derived from a CST | β³ |
Lexer | Create a fully functioning lexer from a grammar | β³ |
Parser | Create a fully functioning parser from a grammmar | β³ |
$ pip3 install --user -U magelang
Currently requires at least Python version 3.12 to run.
Generate a parser for the given grammar in a language that you specify.
Example
mage generate python foo.mage --prefix foo --out-dir src/foolang
Warning
This command is under construction.
Run all tests inside the documentation of the given grammar.
Define a new node or token that must be parsed according the given expression.
You can use both inline rules and other node rules inside expr
. When
referring to another node, that node will become a field in the node that
referred to it. Nodes that have no fields are converted to a special token type
that is more efficient to represent.
pub var_decl = 'var' name:ident '=' type_expr
Define a new inline rule that can be used inside other rules.
As the name suggests, this type of rule is merely syntactic sugar and gets inlined whenever it is referred to inside another rule.
digits = [0-9]+
Defines a new parsing rule that is defined somewhere else, possibly in a different language.
Defines a new lexing rule that is defined somewhere else, possibly in a different language.
Like pub <name> = <expr>
but forces the rule to be a token.
Mage will show an error when the rule could not be converted to a token rule.
This usually means that the rule references another rule that is pub
.
pub token float_expression
= digits? '.' digits
First parse expr1
and continue to parse expr2
immediately after it.
pub two_column_csv_line
= text ',' text '\n'
First try to parse expr1
. If that fails, try to parse expr2
. If none of the
expressions matched, the parser fails.
pub declaration
= function_declaration
| let_declaration
| const_declaration
Parse or skip the given expression, depending on whether the expression can be parsed.
pub singleton_or_pair
= value (',' value)?
Parse the given expression as much as possible.
skip = (multiline_comment | whitespace)*
Parse the given expression one or more times.
For example, in Python, there must always be at least one statement in the body of a class or function:
body = stmt+
Escape an expression by making it hidden. The expression will be parsed, but not be visible in the resulting CST/AST.
Parse the expression at least n
times and at most m
times.
unicode_char = 'U+' hex_digit{4,4}
Treat the given rule as being a potential source for keywords.
String literals matching this rule will get the special _keyword
-suffix
during transformation. The lexer will also take into account that the rule
conflicts with keywords and generate code accordingly.
@keyword
pub token ident
= [a-zA-Z_] [a-zA-Z_0-9]*
Register the chosen rule as a special rule that the lexer uses to lex 'gibberish'.
The rule will still be available in other rules, e.g. when @noskip
was added.
@skip
whitespace = [\n\r\t ]*
Warning
This decorator is under construction.
Disable automatic injection of the @skip
rule for the chosen rule.
This can be useful for e.g. parsing indentation in a context where whitespace is normally discarded.
@skip
__ = [\n\r\t ]*
@noskip
pub body
= ':' __ stmt
| ':' \indent stmt* \dedent
Adding this decorator to a rule ensures that a real CST node is emitted for that rule, instead of possibly a variant.
This decorator makes the CST heavier, but this might be warranted in the name of robustness and forward compatibility. Use this decorator if you plan to add more fields to the rule.
@wrap
pub lit_expr
= literal:(string | integer | boolean)
A special rule that matches any keyword present in the grammar.
The generated CST will contain predicates to check for a keyword:
print_bold = False
if is_py_keyword(token):
print_bold = True
A rule that matches any token in the grammar.
pub macro_call
= name:ident '{' token* '}'
A special rule that matches any parseable node in the grammar, excluding tokens.
A special rule that matches any rule in the grammar, including tokens.
This section documents the API that is generated by taking a Mage grammar as
input and specifying python
as the output language.
In what follows, Node
is the name of an arbitrary CST node (such as
PyReturnStmt
or MageRepeatExpr
) and foo
and bar
are the name of fields
of such a node. Examples of field names are expr
, return_keyword
, min
,
max,
, and so on.
Construct a node with the fields specified in the ...
part of the expression.
First go all elements that are required, i.e. they weren't suffixed with ?
or
*
in the grammar or something similar. They may be specified as positional
arguments or as keyword.
Next are all optional arguments. They must be specified as keyword
arguments. When omitted, the corresponding fields are either set to None
or a
new empty token/node is created.
Creating a new CST node by providing positional arguments for required fields:
PyInfixExpr(
PyNamedExpr('value'),
PyIsKeyword(),
PyNamedExpr('None')
)
The same example but now with keyword arguments:
PyInfixExpr(
left=PyNamedExpr('value'),
op=PyIsKeyword(),
right=PyNamedExpr('None')
)
Omitting fields that are trivial to construct:
# Note that `return_keyword` is not specified
stmt = PyReturnStmt(expr=PyConstExpr(42))
# stmt.return_keyword was automatically created
assert(isinstance(stmt.return_keyword, ReturnKeyword()))
This member is generated when there was a repetition in field foo
such
as the Mage expression '.'+
It returns the amount of elements that are actually present in the CST node.
This is probably due to this feature in the Python type checker, which prevents subclasses from being assigned to a more general type.
For small lists, we recommend making a copy of the list, like so:
defn = PyFuncDef(body=list([ ... ]))
See also this issue in the Pyright repository.
Run the following command in a terminal to link the mage
command to your checkout:
pip3 install -e '.[dev]'
This code is generously licensed under the MIT license.