-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow creation of incomplete/inconsistent objects #56
Comments
I just took a look at replacing |
If the tuple is short, you can use an array of strong references and allocate it on the heap. Problem: there is no public API to create a tuple from an array and... Steal references (it's bad, boooh!). |
[OT] At Pygolo (a bright new Python-Go API) I'm also thinking of dropping |
Would you please share a code example of this approach?
Yes I noticed this when comparing |
|
PyTuple_SetItem() should not exist: it modify an... immutable object! The problem is that Python introspection make obejcts visible while being initialized, whereas most developers are not aware of that. The Python str implementation uses heuristics to check if maybe an object was already accessed (visible in Python): check the reference count, check if the hash was computed, etc. Python immutable str objects can also be modified :-( Maybe an approach for that would be to provide an API to build an object in an efficient way. There is for example the _PyUnicodeWriter API to produce a Python str object. We could have a similar C API to build a Python of an unknown size. |
See also python/cpython#84323 "Modify _PyObject_GC_TRACK() to ensure that newly tracked object is valid". Extract:
|
FTR, Sergey Fedoseev added an internal API for safe and fast tuple creation some years ago: python/cpython#80211 |
From the issue, it's non-obvious to identify which function you're talking about. So it's the new PyObject *
_PyTuple_FromArray(PyObject *const *src, Py_ssize_t n)
{
if (n == 0) {
return tuple_get_empty();
}
PyTupleObject *tuple = tuple_alloc(n);
if (tuple == NULL) {
return NULL;
}
PyObject **dst = tuple->ob_item;
for (Py_ssize_t i = 0; i < n; i++) {
PyObject *item = src[i];
dst[i] = Py_NewRef(item);
}
_PyObject_GC_TRACK(tuple); // <==== the nice part
return (PyObject *)tuple;
} |
I proposed a new internal C API to build a tuple: python/cpython#107139 |
Correct. PyTuple_SetItem(), PyTuple_SET_ITEM() and _PyTuple_Resize() are only needed for tuples created by PyTuple_New(). These tuples should not be tracked by the GC, but currently PyTuple_New() tracks the tuple by the GC as soon as the tuple is created. |
Examples of crashes related to PyTuple_New():
Before I wrote the METH_FASTCALL calling convention, it was common to create an internal cached tuple to call PyObject_Call(), since the cost of creating/destroying the tuple can be significant in fast functions (around 100 ns). Sometimes, this tuple was tracked by the GC but was not always valid. property_descr_get() is an example of such issue. Other crashes related to GC introspection features: |
So, do I understand correctly that the only problem with I actually like the fact that CPython allows filling data directly into the object containers. |
Well, with that you also need a |
Proposed guideline issue: capi-workgroup/api-evolution#36 |
The specific issue of |
At what Python version does "now" refer to? |
Oh. Sadly, I remind badly. In fact, the change was not merged: python/cpython#117747 (comment) |
Wouldn't make sense to reopen this then? |
I proposed more APIs:
|
The C API has many ways to create incomplete and/or inconsistent Python objects. We should disallow that.
One big example is that
PyTuple_New()
API which creates an uninitialized Python tuple object and immediately tracks it in the Garbage Collector. This issue was discussed from 2012 to 2021. At the end, because this API is too widely used, it was decided to not fix this issue: python/cpython#59313 was closed as "not a bug" (!).Python objects must only be tracked by the GC once they are fully initialized. At least, when calling their "traverse" function will not crash. I added
_PyType_AllocNoTrack()
function and used it in a few types to fix this issue in modified types: delayPyObject_GC_Track()
call only after the object is initialized.PyUnicode_New()
is another example: it creates a Python str object with uninitialized characters.These API are written for performance: create a "scratch" object, populate it, and then expose it in Python. Since data is written directly into the object, there is no memory copy or conversion. The problem is that Python introspection gives access to these incomplete/inconsistent objects. It's common that exploring
gc.get_objects()
lead to crashes. In 2016, an optimization for function call in property_descr_get() leaded to crash: issue #70998. There were a bunch of similar bugs, so it motivated me to write the FASTCALL calling convention to avoids the need to create a tuple object to call a C function: issue #71001.Over the years, I also added many
CheckConsistency()
functions, used in assertions, to make sure that objects are consistent. Examples:_PyObject_CheckConsistency()
,_PyUnicode_CheckConsistency()
,_PyType_CheckConsistency()
,_PyDict_CheckConsistency()
.For example, unsafe
PyTuple_New()
can be replaced withPyTuple_Pack()
: the created tuple is only tracked by the GC once it's fully initialized and consistent.PyTuple_Pack()
is a good example to follow.The text was updated successfully, but these errors were encountered: