Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-126220: Adapt _lsprof to Argument Clinic #126233

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sobolevn
Copy link
Member

@sobolevn sobolevn commented Oct 31, 2024

I decided to go with the AC, because:

  1. It is supported in this module
  2. It solves the problem
  3. It fixes multiple signatures

But, I believe it will be slower. If we really want to preserve speed in this case, I can add manual size checks for args.

Refs #103534

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm always happy with fixes via AC, because they're much more scalable.

@gaogaotiantian
Copy link
Member

What's the perf impact here?

@sobolevn
Copy link
Member Author

@gaogaotiantian I've never had experience with profiling cProfile. Do you have any links / suggestions on how to do that?

@gaogaotiantian
Copy link
Member

Just do a quick fib() with cprofile and compare it with the previous implementation would do. Maybe also with one without cprofile.

@sobolevn
Copy link
Member Author

@gaogaotiantian maybe I did something wrong, but looks like we also got a speedup from my numbers 😅 (I have little idea about how cProfile works).

# Test code: ex.py
def test():
    def fib():
        start, nextn = 0, 1
        while True:
            yield start
            start, nextn = nextn, start + nextn

    gen = fib()
    for _ in range(100000):
        next(gen)

import cProfile
cProfile.run(test.__code__)

Running it on main:

» ./python.exe ex.py
         200003 function calls in 0.188 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.017    0.017    0.188    0.188 ex.py:1(test)
   100000    0.147    0.000    0.147    0.000 ex.py:2(fib)
        1    0.000    0.000    0.188    0.188 {built-in method builtins.exec}
   100000    0.024    0.000    0.171    0.000 {built-in method builtins.next}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Running on this PR:

» ./python.exe ex.py                                                     
         200003 function calls in 0.185 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.016    0.016    0.185    0.185 ex.py:1(test)
   100000    0.145    0.000    0.145    0.000 ex.py:2(fib)
        1    0.000    0.000    0.185    0.185 {built-in method builtins.exec}
   100000    0.024    0.000    0.168    0.000 {built-in method builtins.next}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Am I missing something?

@gaogaotiantian
Copy link
Member

gaogaotiantian commented Oct 31, 2024

Sorry I meant a recursive fib().

import time
def fib(n):
    if n <= 1:
        return 1
    return fib(n - 1) + fib(n - 2)
start = time.time()
# enable profiler
fib(24)
# disable profiler
print(time.time() - start)

We need to consider the worst case for cprofile, which is when the code has a lot of function calls. We need to know the overhead of the profiler, and how that is compared with before.

You can also refer to #103533, there's a profiling code listed. That might give some extra information.

Moving to sys.monitoring gave us a 20%+ improvement on overhead, just want to check what's the regression with clinic and whether it's acceptable. Profiler is a bit sensitive to performance because the more overhead there is, the less useful it is.

@sobolevn
Copy link
Member Author

It still gives me a perf boost for some reason 🤔

import time
import cProfile

profiler = cProfile.Profile()

def fib(n):
    if n <= 1:
        return 1
    return fib(n - 1) + fib(n - 2)
start = time.perf_counter()
profiler.enable()
fib(30)  # 100th
profiler.disable()
print(time.perf_counter() - start)
# Old: 0.9349823750017094
# New: 0.9253051670020795
# 1% faster 

@gaogaotiantian
Copy link
Member

Is the boost stable(reproducible)?

@sobolevn
Copy link
Member Author

sobolevn commented Oct 31, 2024

Yes, it is pretty much the same on m2 macos:

(.venv) ~/Desktop/cpython2  main ✗                                                        
» ./python.exe ex.py                                                     
0.9349823750017094
                                                                                           
(.venv) ~/Desktop/cpython2  main ✗                                                        
» ./python.exe ex.py
0.9357958749969839
                                                                                           
(.venv) ~/Desktop/cpython2  main ✗                                                        
» ./python.exe ex.py
0.9352227920026053
                                                                                           
(.venv) ~/Desktop/cpython2  main ✗                                                        
» ./python.exe ex.py
0.924946416002058
                                                                                           
(.venv) ~/Desktop/cpython2  main ✗                                                        
» ./python.exe ex.py
0.926337541997782
                                                                                           
(.venv) ~/Desktop/cpython2  main ✗                                                        
» ./python.exe ex.py
0.9347994170000311
(.venv) ~/Desktop/cpython2  issue-126220 ✗  
» ./python.exe ex.py
0.9370929580036318
                                                                                           
(.venv) ~/Desktop/cpython2  issue-126220 ✗                                                
» ./python.exe ex.py
0.9262957079990883
                                                                                           
(.venv) ~/Desktop/cpython2  issue-126220 ✗                                                
» ./python.exe ex.py
0.9328266249940498
                                                                                           
(.venv) ~/Desktop/cpython2  issue-126220 ✗                                                
» ./python.exe ex.py
0.9382220000011148
                                                                                           
(.venv) ~/Desktop/cpython2  issue-126220 ✗                                                
» ./python.exe ex.py
0.9285237919984502
                                                                                           
(.venv) ~/Desktop/cpython2  issue-126220 ✗                                                
» ./python.exe ex.py
0.9296539580027456

@gaogaotiantian
Copy link
Member

Yeah from the result I don't think there's an observable boost. On the other hand, that's good, because no observable regression either.

@gaogaotiantian gaogaotiantian added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Oct 31, 2024
Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a side note - this needs to be backported to 3.12 as well :) I added the tags so you should be able to just merge it.

@sobolevn
Copy link
Member Author

@gaogaotiantian sorry, I forgot to highlight it initially. Please, double check params names that I introduced. Are they correct?

@gaogaotiantian
Copy link
Member

Actually, I have some doubts about the changes for functions that are not crashing. This is a bug fix, which would be backported. I don't think we should mix in the code polish to the PR. Even though I think changing this in main would be fine, I would prefer having only the crash fix (could be with AC) in one PR, and the rest in the other. One thing that bothers me immediately is that I can't quickly convince myself the changes to __init__ is purely equivalent.

As for the argument name, the obj in the callbacks should be either unused or instruction_offset(which is what it is).

@gaogaotiantian
Copy link
Member

Can we split this PR into 2? One with the changes to only the 4 callbacks, and the other with all the other methods?

@sobolevn
Copy link
Member Author

Fair enough, I would split this PR later! 👍
Thank you!

Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sobolevn
Copy link
Member Author

@erlend-aasland maybe you would be interested in double checking the AC part? :)

@erlend-aasland
Copy link
Contributor

We generally do not backport Argument Clinic adaptions; would it be possible to solve the issue without Argument Clinic first, and backport that bugfix; then apply Argument Clinic?

@sobolevn
Copy link
Member Author

sobolevn commented Nov 1, 2024

@erlend-aasland I think that this case is rather special. This is not just "convert X to use AC" type of PR. This PR solves a crash that was caused by incorrect function args handling. Basically, I can use PyArg_ParseTuple* to do the same, but I don't see a point in doing that and then converting it to AC in 3.14 only.

I can go with simple:

if (size < REQUIRED_ARGS) {
    PyErr_Format(PyExc_TypeError, "got %d arguments, expected %d", ...);
    return NULL;
}

This will kinda solve this case, but will be rather ugly.

@sobolevn
Copy link
Member Author

sobolevn commented Nov 1, 2024

@erlend-aasland sorry for bothering, but I cannot decode your 👍 reaction :)
Does it mean:

  1. Let's go with if (size)?
  2. Let's keep it as is?

:)

@erlend-aasland
Copy link
Contributor

This will kinda solve this case, but will be rather ugly.

I see, and thanks for the explanation. I think I still would feel better with such workarounds1 for 3.13 and 3.12, and then introduce the Argument Clinic adaptions in 3.14. Mixing features and bugfixes is seldom a good thing.

Footnotes

  1. IMO, they are not that ugly

@sobolevn
Copy link
Member Author

sobolevn commented Nov 1, 2024

Ok, I will open a new PR with the fix we can backport and then rebase this one and apply all AC fixes from the initial version. Thanks for the feedback! 👍

@erlend-aasland
Copy link
Contributor

Please amend the title to reflect the actual change :)

@sobolevn sobolevn changed the title gh-126220: Fix crash on calls to _lsprof.Profiler methods with 0 args gh-126220: Convert _lsprof to AC Nov 1, 2024
Modules/_lsprof.c Outdated Show resolved Hide resolved
Modules/_lsprof.c Outdated Show resolved Hide resolved
Modules/_lsprof.c Outdated Show resolved Hide resolved
Modules/_lsprof.c Outdated Show resolved Hide resolved
@erlend-aasland erlend-aasland changed the title gh-126220: Convert _lsprof to AC gh-126220: Adapt _lsprof to Argument Clinic Nov 1, 2024
sobolevn and others added 3 commits November 2, 2024 08:30
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants