Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python backend #485

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -284,3 +284,90 @@ BNFC adds the grammar name as a file extension. So if the grammar file is
named ``Calc.cf``, the lexer will be associated to the file extension
``.calc``. To associate other file extensions to a generated lexer, you need to
modify (or subclass) the lexer.

Python Backend
===============

The BNF Converter's Python Backend generates a Python frontend, that uses
`PLY <https://www.dabeaz.com/ply/ply.html>`_ (Python Lex Yacc), to parse
input into an abstract syntax tree.

Python 3.10 or higher is needed.

Example usage: ::

bnfc --python Calc.cf


.. list-table:: The result is a set of files:
:widths: 25 25
:header-rows: 1

* - Filename
- Description
* - bnfcPyGenCalc/Absyn.py
- Provides the classes for the abstract syntax.
* - bnfcPyGenCalc/LexTokens.py
- Provides PLY with the information needed to build the lexer.
* - bnfcPyGenCalc/ParserDefs.py
- Provides PLY with the information needed to build the parser.
* - bnfcPyGenCalc/PrettyPrinter.py
- Provides printing for both the AST and the linearized tree.
* - genTest.py
- A ready test-file, that uses the generated frontend to convert input into an AST.
* - skele.py
- Provides skeleton code to deconstruct an AST, using structural pattern matching.

Optionally one may with ``-m`` also create a makefile that contains the target
"distclean" to remove the generated files.

Testing the frontend
....................

It's possible to pipe input, like::

echo "(1 + 2) * 3" | python3 genTest.py

or::

python3 genTest.py < file.txt

and it's possible to just use an argument::

python3 genTest.py file.txt


Caveats
.......

Presentation of conflicts in a grammar:

A symbol-to-unicode transformation is made for the terminals in the grammar,
for example from "++" to "S_43_43". This however obfuscates PLYs generated
information of the grammar in the "parser.out" file. Users are hence
encouraged to use the Haskell backend to debug grammars and identify
conflicts.

Several entrypoints:

At the top of the ParserDefs.py file an additional rule is added, that has
every defined entrypoint as a possible production. This may create warnings
for conflicts, as it may introduce ambiguity. Therefore the added
parsing rule is by default removed beneath the function, with the statement
"del p__Start", and included if the user comments out the removal of
"p__Start".

Special cases for special characters:

Using non-special characters, instead of say parentheses when defining rules,
may not yield the expected behaviour. Using the below rule, an expression
such as "a1+2a" can not be parsed since the a's are classified as reserved
keywords, like "int", instead of symbols like "+"::

_. Exp1 ::= "a" Exp "a" ;

Results from the parameterized tests:

While the Python backend generates working frontends for the example
grammars, four "failures" and six "errors" among the regression
tests are reported.
218 changes: 218 additions & 0 deletions document/BNF_Converter_Python_Mode.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
<!DOCTYPE html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<title>BNF Converter Python Mode</title>
</head>
<style>
table {
font-family: arial, sans-serif;
border-collapse: collapse;
width: 100%;
}

td, th {
text-align: left;
padding: 4px;
}

</style>
<body>
<div style="text-align: center;">
<h2>BNF Converter</h2>
<h2>Python Mode</h2>
</div>
<h3>By Björn Werner</h3>

<h3>2024</h3>
<p>
The BNF Converter's Python Backend generates a Python frontend, that uses
PLY (Python Lex Yacc), to parse input into an AST (abstract syntax tree).
</p>
<p>
BNFC on Github:<br>
<a href="https://github.com/BNFC/bnfc">https://github.com/BNFC/bnfc</a>
</p>
<p>
PLY homepage:<br>
<a href="https://www.dabeaz.com/ply/ply.html">https://www.dabeaz.com/ply/ply.html</a>
</p>
<p>
Python 3.10 or higher is needed.
</p>
<h3>Usage</h3>
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
bnfc --python NAME.cf</span></big><br style="font-family: monospace; ">
</div>
<p>
The result is a set of files:
</p>
<table style="padding: 1cm;">
<tr>
<th>Filename:</th><th>Description:</th>
</tr>
<tr>
<td>bnfcGenNAME/LexTokens.py</td><td>Provides PLY with the information needed to build the lexer.</td>
</tr>
<tr>
<td>bnfcGenNAME/Absyn.py</td><td>Provides the classes for the abstract syntax.</td>
</tr>
<tr>
<td>bnfcGenNAME/ParserDefs.py</td><td>Provides PLY with the information needed to build the parser.</td>
</tr>
<tr>
<td>bnfcGenNAME/PrettyPrinter.py</td><td>Provides printing for both the AST and the linearized tree.</td>
</tr>
<tr>
<td>genTest.py</td><td>A ready test-file, that uses the generated frontend to convert input into an AST.</td>
</tr>
<tr>
<td>skele.py</td><td>Provides skeleton code to deconstruct an AST, using structural pattern matching.</td>
</tr>
</table>

<h3>Testing the frontend</h3>
<p>
The following example uses a frontend that is generated from a C-like grammar.
</p>
<p style="font-family: monospace;">
$ python3 genTest.py < hello.c
</p>
<p style="font-family: monospace;">
Generating LALR tables<br>
Parse Successful!<br>
<br>
[Abstract Syntax]<br>
(PDefs [(DFun Type_int "main" [] [(SExp (EApp "printString" [(EString "Hello world")])), (SReturn (EInt 0))])])<br>
<br>
[Linearized Tree]<br>
int main ()<br>
{<br>
&nbsp;printString ("Hello world");<br>
&nbsp;return 0;<br>
}<br>
</p>
<p>
The LALR tables are cached in a file called "parsetab.py", and a description by PLY of the grammar is stored in a file called "parser.out".
</p>
<h3>The Abstract Syntax Tree</h3>
<p>
The AST is built up using instances of Python classes, using the dataclass decorator, such as:
</p>
<p style="font-family: monospace;">
@dataclass<br>
class EAdd:<br>
&nbsp;exp_1: Exp<br>
&nbsp;exp_2: Exp<br>
&nbsp;_ann_type: _AnnType = field(default_factory=_AnnType)
</p>
<p>
The "_ann_type" variable is a placeholder that can be used to store useful information,
for example type-information in order to create a type-annotated AST.
</p>
<h3>Using the skeleton file</h3>
<p>
The skeleton file serves as a template, to create an interpreter for example.
Two different types of matchers are generated: the first with all the value
categories together, and a second type where each matcher only has one
individual value category, as in the example below:
</p>
<p style="font-family: monospace;">
def matcherExp(exp_: Exp):<br>
&nbsp;match exp_:<br>
&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
&nbsp;&nbsp;&nbsp;raise Exception('EAdd not implemented')<br>
&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
This can be modified, in order to return the addition of each evaluated argument
category, into:
</p>
<p style="font-family: monospace;">
def matcherExp(exp_: Exp):<br>
&nbsp;match exp_:<br>
&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
&nbsp;&nbsp;&nbsp;return matcherExp(exp_1) + matcherExp(exp_2)<br>
&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
The function can now be imported and used in the generated test file
(similarly to how the pretty printer is imported and used):
</p>
<p style="font-family: monospace;">
from skele import matcherExp<br>
...<br>
print(matcherExp(ast))
</p>

<h3>Known issues</h3>
<h4>
Presentation of conflicts in a grammar:
</h4>
<p>
A symbol-to-unicode transformation is made for the terminals in the grammar,
for example from "++" to "S_43_43". This however obfuscates PLYs generated
information of the grammar, inside the "parser.out" file. Users are hence
encouraged to use say the Haskell backend to debug their
grammars and identify conflicts.
</p>
<h4>
Several entrypoints:
</h4>
<p>
At the top of the ParserDefs.py file an additional rule is added, that has
every defined entrypoint as a possible production. This may create warnings
for conflicts if it introduces ambiguity, and warnings for unused rules if
the "_Start" category is not used as the entrypoint. Therefore the added
parsing rule is by default removed beneath the function, "del p__Start",
and included if the user comments out the removal:
</p>
<h4>
Skeleton code for using lists as entrypoints:
</h4>
<p>
Matchers for using lists, such as [Exp], are not generated in the
skeleton code as it may confuse users if the grammar uses several different
list categories. Users are instead encouraged to use a non-list entrypoint.
</p>
<p>
The improper way to iterate over lists, as the value category is unknown:
</p>
<p style="font-family: monospace;">
&nbsp;case list():<br>
&nbsp;&nbsp;for ele in ast:<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
The proper way to deconstruct lists, where we know the value category:
</p>
<p style="font-family: monospace;">
&nbsp;case RuleName(listexp_):<br>
&nbsp;&nbsp;for exp in listexp_:<br>
&nbsp;&nbsp;&nbsp;...
</p>
<h4>
Special cases for special characters
</h4>
<p>
Using non-special characters instead of say parentheses when defining rules, may not yield the expected
behaviour. Using the below rule, an expression such as "a1+2a" can not be parsed.
</p>
<p style="font-family: monospace;">
_. Exp1 ::= "a" Exp "a" ;
</p>
<h4>
Using multiple separators
</h4>
<p>
Using multiple separators for the same category, such as below, generates
Python functions with overlapping names, causing runtime errors.
</p>
<p style="font-family: monospace;">
separator Exp1 "," ;<br>
separator Exp1 ";" ;
</p>
9 changes: 9 additions & 0 deletions source/BNFC.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,15 @@ library
BNFC.Backend.TreeSitter.CFtoTreeSitter
BNFC.Backend.TreeSitter.RegToJSReg

-- Python backend
BNFC.Backend.Python
BNFC.Backend.Python.CFtoPyAbs
BNFC.Backend.Python.CFtoPyLex
BNFC.Backend.Python.CFtoPyPrettyPrinter
BNFC.Backend.Python.RegToFlex
BNFC.Backend.Python.PyHelpers
BNFC.Backend.Python.CFtoPySkele

----- Testing --------------------------------------------------------------

test-suite unit-tests
Expand Down
3 changes: 3 additions & 0 deletions source/main/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import BNFC.Backend.Latex
import BNFC.Backend.OCaml
import BNFC.Backend.Pygments
import BNFC.Backend.TreeSitter
import BNFC.Backend.Python
import BNFC.CF (CF)
import BNFC.GetCF
import BNFC.Options hiding (make, Backend)
Expand Down Expand Up @@ -83,3 +84,5 @@ maketarget = \case
TargetPygments -> makePygments
TargetCheck -> error "impossible"
TargetTreeSitter -> makeTreeSitter
TargetPython -> makePython

Loading
Loading