-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107137: Add _PyTupleBuilder API to the internal C API #107139
Conversation
I chose to use unsigned cc @serhiy-storchaka @methane @pablogsal @erlend-aasland @corona10 |
HPy API to build a tuples and lists: https://docs.hpyproject.org/en/latest/api-reference/builder.html HPy API seems to only support creating tuple of a size known in advance: it's not possible to extend or shrink the tuple. |
I'm considering to add a "Set" function later, and/or maybe a "GetUnsafe" function. But I prefer to start with a minimum API :-) |
-1. This feels unnecessarily elaborate. |
This API is a fix for this old issue: #59313 "Incomplete tuple created by PyTuple_New() and accessed via the GC can trigged a crash" which was closed as "not a bug" in 2021 by @pablogsal. |
Do you mean that capi-workgroup/problems#56 is not an issue, or that this API is too complicated to use? |
I believe we should review the problem before jumping to a solution. And if this is the solution, I'm not sure that it's worth fixing the problem. So please hold your horses here. |
The root issue is that PyTuple_New() tracks directly the tuple by the GC. Moreover, _PyTuple_Resize() tracks also the tuple by the GC. My PR adds _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() to avoid this issue. The _PyTupleBuilder is built on top of it to wrap the memory allocations. |
Add _PyTupleBuilder structure and functions: * _PyTupleBuilder_Init() * _PyTupleBuilder_Alloc() * _PyTupleBuilder_Append() * _PyTupleBuilder_AppendUnsafe() * _PyTupleBuilder_Finish() * _PyTupleBuilder_Dealloc() The builder tracks the size of the tuple and resize it in _PyTupleBuilder_Finish() if needed. Don't allocate empty tuple. Allocate an array of 16 objects on the stack to avoid allocating small tuple. _PyTupleBuilder_Append() overallocates the tuple by 25% to reduce the number of _PyTuple_Resize() calls. Do no track the temporary internal tuple by the GC before _PyTupleBuilder_Finish() creates the final complete and consistent tuple object. Use _PyTupleBuilder API in itertools batched_traverse(), PySequence_Tuple() and initialize_structseq_dict(). Add also helper functions: * _PyTuple_ResizeNoTrack() * _PyTuple_NewNoTrack()
PR rebased to fix a merge conflict. |
All I am asking is that you hold off for now. |
I marked this PR as a draft. |
Guido, even if there is a better way to solve this issue, adding the internal private API doesn't look harmful. |
@corona10 I worry that APIs are forever, even internal ones. So I recommend that we have a discussion somewhere so we're all on the same page about the problem we're trying to solve and then we can how to solve it, rather than jumping the gun. (I don't require unanimity, just more people having thought about it and come to roughly the same conclusion than just Victor.) |
Thank you for the answer. CPython seems to be on the brink of change in many topics. |
In the main Python branch, you can still crash PySequence_Tuple() because it creates a temporary tuple with PyTuple_New() which is immediately tracked by the GC: import gc
TAG = object()
def monitor():
lst = [x for x in gc.get_referrers(TAG)
if isinstance(x, tuple)]
t = lst[0] # this *is* the result tuple
print(t) # full of nulls !
return t # Keep it alive for some time
def my_iter():
yield TAG # 'tag' gets stored in the result tuple
t = monitor()
for x in range(10):
yield x # SystemError when the tuple needs to be resized
tuple(my_iter()) code from: #59313 (comment) Program in gdb:
The crash occurs in The problem is that there is a second strong reference to the tuple (created by |
This PR fix this bug. With this PR,
|
@vstinner Please stop pushing. We've lived with this forever. It doesn't have to be fixed today. |
I extracted the non-controversial part of this PR, only _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack(), in a new PR fixing the PySequence_Tuple() crash: PR #107183. |
Can tuples be made robust enough to survive being called while partially filled? The If not, tools like |
Raymond's idea of filling with |
It seems like issues solved by the proposed _PyTupleBuilder API have different solutions discussed at PR #107183. This API doesn't seem to be the preferred option, so I prefer to close my PR and investigate other options first. |
Follow-up: I created issue #111489 to make |
Add _PyTupleBuilder structure and functions:
The builder tracks the size of the tuple and resize it in _PyTupleBuilder_Finish() if needed. Don't allocate empty tuple. Allocate an array of 16 objects on the stack to avoid allocating small tuple. _PyTupleBuilder_Append() overallocates the tuple by 25% to reduce the number of _PyTuple_Resize() calls.
Do no track the temporary internal tuple by the GC before _PyTupleBuilder_Finish() creates the final complete and consistent tuple object.
Use _PyTupleBuilder API in itertools batched_traverse(), PySequence_Tuple() and initialize_structseq_dict().
Add also helper functions: