pygl (for Python Grammar Language) is one of the projects exploring ideas to Switch Python’s parsing tech to something more powerful than LL(1).
The main objective of the project is to produce a PEG grammar for Python, so different PEG parser generators can be tested as parsing technologies for Python. The tool used to bootstrap the process is TatSu (currently, pygl requires the unreleased master version)
The strategy used in pygl is explained on this topic on Python's Discourse site.
Currently, the TatSu PEG grammar for Python is being debugged against the Python source code in in the CPython Git repository (~ 787 KLOC).
These are the steps of the plan:
- ✓ Create a TatSu parser to parse
Grammar/Grammar
- ✓ Parse the
Grammar/Grammar
using the above parser - ✓ Generate a draft PEG grammar for Python from the above using TatSu
- ✓ Debug the PEG grammar using TatSu (PEG semantics require rule-choice ordering, etc.). The grammar is currenly good for parsing 126 KLOC of Python in
cpython/**/*.py
. - ✓ Integrate the Python tokenizer using the
token
andtokenize
modules. - Generate AST from Python source using the above (at this point, the grammar is debugged and the parser complete)
- ✓ Measure parser performance (it should be within the expected Python vs C range). Pass, or abort
- ✓ Automatically generate a peg grammar for Python from the above
- ⇒ Debug the
peg
grammar - Integrate the Python C tokenizer using the
token.h
andtokenizer.h
modules. - Instrument the
peg
grammar to generate AST (as TatSu,peg
allows naming parse subexpressions) - Measure, and pass or abort
- Customize
peg
and thepeg
grammar so it islibpython
compatible (peg
provides for this). - Add a node visitor to translate the
peg
grammar to "documentation grammar". - The current Python parser can be replaced by a PEG parser that is easy to maintain and covers source->AST.
To run the current state of things:
$ pip install -r requirements-dev.pip
$ pytest
To run the tests over ~/cpython/**/*.py
using all CPU cores, just type:
$ make
See the Makefile
or type this for options:
python -m test.parse -h