-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytecode optimizer #429
Bytecode optimizer #429
Conversation
lib/peg.d.ts
Outdated
type OutputType = | ||
| "ast" | ||
| "parser" | ||
| "source-and-map" | ||
| "source-with-inline-map" | ||
| "source" | ||
; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed something here... I should probably be using SourceBuildOptions rather than Options. Although SourceOutputs doesn't include "ast"
. I'll take a look...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change that reverts this, and uses SourceBuildOptions<SourceOutputs>
instead
00eb942
to
821912a
Compare
I only have one concern: the bytecode was introduced by David (creator of pegjs) to be an abstraction over piece of JS code, that can be tested easily and have simple semantic. This change turns the bytecode into a more complicated thing, which probably could be simplified by using a more low-level bytecode. So probably we should create that low-level bytecode and generate it directly? And then use it to generate JS. Roughly speaking, that means we should just renumber our bytecode instructions. It seems to me that it will be more in the spirit of the library, didn't it? |
I definitely want to rework code generation at some point, because the current mechanism of generating source maps isn't really sustainable, particularly as we want to add other output formats. |
So definitely having a lower level bytecode would make it possible to optimize more. Some of the bytecodes just do too much currently. But changing the bytecode presumably breaks all existing plugins - so I was trying to avoid doing that. If you don't think this is something you want to take, I can redo it as a plugin, and maintain it myself until the big bytecode rewrite, when it will hopefully become redundant... |
I think you'd still want to keep code generation simple, and then optimize the bytecode. I mean, without changing the bytecode at all, the code generator could emit optimized bytecode similar to what I'm producing here. But I think it would be harder to do... |
So I've taken this, and some more work I did on top, and converted it to a plugin. Since this depends on some non-exported functionality, and since I also need a few updates to I still think this (plus the additional work Ive done) is worth taking; the generated code with the plugin is much cleaner - and much closer to what a person might write (although loops are still awkward). |
I'm going to close this in favor of the plugin for now. Might revisit after we look at code generation. |
This stack implements a framework for performing optimizations to the bytecode (file
interp-state.js
), and then implements a number of fairly simple transformations via a new pass,optimize-bytecode.js
.After the initial commits of
interp-state.js
and its tests, I've committed each optimization separately, with an "artifacts" diff to show how it affectslib/parser.js
, to make it easy to review/discuss. I plan to remove all the artifacts diffs, and add a single "Build Artifacts" commit once the pull request settles down.Most of what I've implemented so far is pretty local, and depends on recognizing simple patterns (like PUSH_NULL, POP). If this framework is accepted I plan to add some more global analysis, so that eg an unused PUSH_CURR_POS can be removed, even when the POP is hidden behind a lot of more complex code; or immediately dropping unused elements from a plucked sequence, rather than generating them all, then ignoring the ones that aren't plucked (obviously their side effects still have to be executed); or not bothering to build an array for a sequence wrapped in a
text
node.