-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion: pluggy speedup by using 'codegen' #449
Comments
Let me explain the idea by walking you through its evolution from the beginning:
class NewerPluginManager(PluginManager):
def do_compile(self):
"""When you're ready to enter !!turbo!! mode, run this. hook attribute will be replaced and can't be restored"""
# NOTE: this is not the real code. I don't have a copy of that, so this is approximate,
# for explanation purposes
plugins = []
code = ""
for i, impl in enumerate(reversed(self.hook.fun.get_hookimpls())):
plugins[i] = impl.function
code += f"result.append(plugins[{i}](hooks, nesting))\n"
@dataclasses.dataclass
class Caller:
locs: Mapping
pyc: Any
def fun(self, **kwargs):
result = []
exec(self.pyc, None, {'result': result} | kwargs | self.locs)
return result
self.hook = Caller({'plugins': plugins}, compile(code, "", "exec"))
...
...
# And its usage in the test case below:
for i in range(plugins):
pm.register(Plugin(i), name=f"plug_{i}")
for i in range(wrappers):
pm.register(PluginWrap(i), name=f"wrap_plug_{i}")
if hasattr(pm, 'do_compile'):
# This step is TEMPORARY, just for the POC, don't worry :)
pm.do_compile()
benchmark(pm.hook.fun, hooks=pm.hook, nesting=nesting) I hope that the above example illustrates the concept. The above runs and works as well as the version from step 2. It is a bit slower than step 2 but a lot faster than the baseline pluggy code.
The code resulting after step 5 (which is where I've stopped for now) is now in a temporary branch in my fork. |
I've started working on a general purpose Python package to facilitate this type of optimization. Perhaps it will be worth revisiting this idea in Pluggy after I've had a chance to analyze and streamline the concept in a context completely external to pluggy. |
I intend to prepare some refactorings that move more responsibility into the hook callers as to ensure a simpler inner loop My personal intent is to eventually move those loops from the python level to the c level as the Interpreter overhead seems more than say a simple loop Right now im under the impression that removing call overhead and hook parameter overhead are a bigger win than dynamically unrolling loops. That being said I also don't quite follow the optimization tooling you are preparing, some docs that pulled together the research and benchmarks + explain the niche/structure where this manifests it's wins would be highly appreciated. With my current understanding of the technique you proposed, my impression is, that The potential gains would melt away as the tooling moves from poc to feature party, and I'd love to be wrong there |
Well said. I agree that performance gains often "melt away" in the move to feature parity. I cannot pave the way into C extensions, as I've never done it before. I'd love to see how it is done though and will be happy to help with details in C once the outline is in place. (I do have quite a lot of C experience and love the language in general). Since you asked about specific gains, I will highlight one particular line. As I described in my earliest comment (in the other ticket), I started out with some experiments where I hard coded some values to see where speedups happen. There were two lines in particular that I found to be particularly costly in the original code:
It's that first one there which led to me developing the rather convoluted solution that I am now fine tuning. Edit: I also wanted to underscore that the performance wins do not come from loop unrolling or from dropping a bit of unused code (those were mentioned more as simple examples of optimizations that also do happen to be occuring). The main win is in the way that the call to the plugin is built (it's not a list, built through iterating and doing dict lookups), instead it uses dynamically-hard-coded names against symbols that are injected into the |
The cast is just for type annotations There's a number of potential techniques to simplify certain details The loop about arguments for example can be replaced with a pre-allocated vector call in the c api Hook calling vs hook wrapper calling can be split into more concrete apis I still strongly doubt that a python codegen will bring consistent enhancement to pytest where calls with different hook sets and additional hooks are common |
I understand that it is for type annotations, but there is a comment in the code that implies that pluggy is also counting on the function to effectively provide a type check as well. In any case, I was just trying to answer your question about the specific areas where I'd observed (empirically) a possible performance impact.
I think this answer resolves my initial question (in the other ticket) about whether a "codegen" approach would be a good fit here. It sounds like the sequence of hooks is too dynamic, and it sounds like there are simpler solutions for performance improvements which are already in the pipeline. I'll close this ticket since I think the idea has been thoroughly discussed! |
I wanted to pull this conversation into a new ticket so that I don't clutter up @RonnyPfannschmidt 's project ticket. This discussion relates to issue #322 .
The idea is not new. The technique is described here, for example: https://medium.com/@yonatanzunger/advanced-python-achieving-high-performance-with-code-generation-796b177ec79
I am shortening the concept to the word 'codegen' for the sake of this discussion.
I'm hoping this discussion can allow project leaders to ask and answer these questions:
Some key takeaways:
The text was updated successfully, but these errors were encountered: