Disallow creation of incomplete/inconsistent objects #56

vstinner · 2023-07-03T22:13:06Z

The C API has many ways to create incomplete and/or inconsistent Python objects. We should disallow that.

One big example is that PyTuple_New() API which creates an uninitialized Python tuple object and immediately tracks it in the Garbage Collector. This issue was discussed from 2012 to 2021. At the end, because this API is too widely used, it was decided to not fix this issue: python/cpython#59313 was closed as "not a bug" (!).

Python objects must only be tracked by the GC once they are fully initialized. At least, when calling their "traverse" function will not crash. I added _PyType_AllocNoTrack() function and used it in a few types to fix this issue in modified types: delay PyObject_GC_Track() call only after the object is initialized.

PyUnicode_New() is another example: it creates a Python str object with uninitialized characters.

These API are written for performance: create a "scratch" object, populate it, and then expose it in Python. Since data is written directly into the object, there is no memory copy or conversion. The problem is that Python introspection gives access to these incomplete/inconsistent objects. It's common that exploring gc.get_objects() lead to crashes. In 2016, an optimization for function call in property_descr_get() leaded to crash: issue #70998. There were a bunch of similar bugs, so it motivated me to write the FASTCALL calling convention to avoids the need to create a tuple object to call a C function: issue #71001.

Over the years, I also added many CheckConsistency() functions, used in assertions, to make sure that objects are consistent. Examples: _PyObject_CheckConsistency(), _PyUnicode_CheckConsistency(), _PyType_CheckConsistency(), _PyDict_CheckConsistency().

For example, unsafe PyTuple_New() can be replaced with PyTuple_Pack(): the created tuple is only tracked by the GC once it's fully initialized and consistent. PyTuple_Pack() is a good example to follow.

The text was updated successfully, but these errors were encountered:

davidhewitt · 2023-07-04T21:51:09Z

I just took a look at replacing PyTuple_New() with PyTuple_Pack() in PyO3. One case where this doesn't seem to be possible is when the length is not known until runtime. If we wanted to avoid PyTuple_New() in these cases, I think we'd need to add a PyTuple_AllocNoTrack() function equivalent to the _PyType_AllocNoTrack() you describe above?

vstinner · 2023-07-05T06:15:07Z

If the tuple is short, you can use an array of strong references and allocate it on the heap. Problem: there is no public API to create a tuple from an array and... Steal references (it's bad, boooh!).

cavokz · 2023-07-05T06:41:51Z

[OT] At Pygolo (a bright new Python-Go API) I'm also thinking of dropping PyTuple_New and PyTuple_SetItem, I guess there is no use case for PyTuple_SetItem besides initializing a tuple created with PyTuple_New, right?

davidhewitt · 2023-07-05T07:24:15Z

If the tuple is short, you can use an array of strong references and allocate it on the heap.

Would you please share a code example of this approach?

Problem: there is no public API to create a tuple from an array and... Steal references (it's bad, boooh!).

Yes I noticed this when comparing PyTuple_New to PyTuple_Pack that I'm paying for an additional incref / decref cycle. Do you think this problem is widespread enough that people would be willing for PyTuple_PackOwned? (Or PyTuple_PackGive or whatever other naming convention comes out of #11.)

vstinner · 2023-07-05T12:03:43Z

Py_BuildValue("NNN", ref1, ref2, ref3) can be used to build a tuple of 3 items and steal references.

vstinner · 2023-07-09T08:52:34Z

PyTuple_SetItem() should not exist: it modify an... immutable object! The problem is that Python introspection make obejcts visible while being initialized, whereas most developers are not aware of that.

The Python str implementation uses heuristics to check if maybe an object was already accessed (visible in Python): check the reference count, check if the hash was computed, etc. Python immutable str objects can also be modified :-(

Maybe an approach for that would be to provide an API to build an object in an efficient way. There is for example the _PyUnicodeWriter API to produce a Python str object. We could have a similar C API to build a Python of an unknown size.

vstinner · 2023-07-11T00:43:30Z

See also python/cpython#84323 "Modify _PyObject_GC_TRACK() to ensure that newly tracked object is valid".

Extract:

PyType_GenericAlloc() allocates memory for a type allocated on the heap... and then immediately track it in the GC. Problem: this type is not initialized yet, all fields are set to zero. Calling type_traverse() at this point fails with an assertion error: (...)

erlend-aasland · 2023-07-11T19:45:09Z

PyTuple_SetItem() should not exist: it modify an... immutable object! The problem is that Python introspection make obejcts visible while being initialized, whereas most developers are not aware of that.

FTR, Sergey Fedoseev added an internal API for safe and fast tuple creation some years ago: python/cpython#80211

vstinner · 2023-07-12T00:23:36Z

FTR, Sergey Fedoseev added an internal API for safe and fast tuple creation some years ago: python/cpython#80211

From the issue, it's non-obvious to identify which function you're talking about. So it's the new _PyTuple_FromArray() function which copies objects by creating new strong references, and only track the tuple once all items are fully initialized. The code is quite short and fit here:

PyObject *
_PyTuple_FromArray(PyObject *const *src, Py_ssize_t n)
{
    if (n == 0) {
        return tuple_get_empty();
    }

    PyTupleObject *tuple = tuple_alloc(n);
    if (tuple == NULL) {
        return NULL; 
    }
    PyObject **dst = tuple->ob_item;
    for (Py_ssize_t i = 0; i < n; i++) {
        PyObject *item = src[i];
        dst[i] = Py_NewRef(item);
    }
    _PyObject_GC_TRACK(tuple);  // <==== the nice part
    return (PyObject *)tuple;
}

vstinner · 2023-07-23T19:43:43Z

I proposed a new internal C API to build a tuple: python/cpython#107139

vstinner · 2023-07-23T23:14:49Z

I guess there is no use case for PyTuple_SetItem besides initializing a tuple created with PyTuple_New, right?

Correct. PyTuple_SetItem(), PyTuple_SET_ITEM() and _PyTuple_Resize() are only needed for tuples created by PyTuple_New(). These tuples should not be tracked by the GC, but currently PyTuple_New() tracks the tuple by the GC as soon as the tuple is created.

vstinner · 2023-08-14T22:22:46Z

Examples of crashes related to PyTuple_New():

2003: tuple(generator) + gc.get_objects(): gc.get_referrers() is inherently dangerous python/cpython#39117
2016: property_descr_get() uses an internal tuple to optimize PyObject_Call(), tuple is tracked but sometimes invalid: segfault due to null pointer in tuple python/cpython#70998 -- see also issue #68464 (related to issue #58581)

Before I wrote the METH_FASTCALL calling convention, it was common to create an internal cached tuple to call PyObject_Call(), since the cost of creating/destroying the tuple can be significant in fast functions (around 100 ns). Sometimes, this tuple was tracked by the GC but was not always valid. property_descr_get() is an example of such issue.

Other crashes related to GC introspection features:

2021: There is a way to access an underlying mapping in MappingProxyType python/cpython#88004

scoder · 2023-10-18T07:29:43Z

So, do I understand correctly that the only problem with PyTuple_New() is that it immediately tracks the object in GC, rather than leaving it to the user to do that after populating the tuple? That could be solved with a PyTuple_FinishNew() function that enables the tracking, right? Similar for PyUnicode etc.

I actually like the fact that CPython allows filling data directly into the object containers.

encukou · 2023-10-18T09:00:25Z

That could be solved with a PyTuple_FinishNew() function that enables the tracking, right?

Well, with that you also need a PyTuple_NewUnfinished that allocates a tuple but doesn't enable tracking yet. Here's my draft of what this line of thinking leads to: capi-workgroup/api-evolution#20 (comment)
That draft also allows filling tuples in implementations where they aren't actually backed by a PyObject* array -- in a way that adds no cost on CPython.

encukou · 2023-10-24T13:24:29Z

Proposed guideline issue: capi-workgroup/api-evolution#36

vstinner · 2024-06-10T14:03:20Z

The specific issue of PyTuple_New() was solved differently: it's now initialized by setting all items to None, rather than NULL. I close the issue.

cavokz · 2024-06-10T15:06:09Z

The specific issue of PyTuple_New() was solved differently: it's now initialized by setting all items to None, rather than NULL. I close the issue.

At what Python version does "now" refer to?

vstinner · 2024-06-10T15:34:56Z

At what Python version does "now" refer to?

Oh. Sadly, I remind badly. In fact, the change was not merged: python/cpython#117747 (comment)

cavokz · 2024-06-10T19:49:30Z

At what Python version does "now" refer to?

Oh. Sadly, I remind badly. In fact, the change was not merged: python/cpython#117747 (comment)

Wouldn't make sense to reopen this then?

vstinner · 2024-09-06T14:29:56Z

I proposed a new internal C API to build a tuple: python/cpython#107139

I proposed more APIs:

PyUnicodeWriter API for str: added to Python 3.14
PyBytesWriter API for bytes: Add PyBytes_Writer() API decisions#39
PyUnicode_Import() API for str: PEP 756: Add PyUnicode_Export() and PyUnicode_Import() to the limited C API decisions#33
PyLongWriter_Create() API for int: PEP 757: Add import-export API for Python int objects decisions#35

davidhewitt mentioned this issue Jul 4, 2023

use PyTuple_Pack in fixed-size tuple conversions PyO3/pyo3#3296

Closed

vstinner mentioned this issue Jul 10, 2023

C API: Add PyObject_AsObjectArray() function: get tuple/list items as PyObject** array python/cpython#106593

Closed

iritkatriel added theme: type definition and object creation v labels Jul 21, 2023

vstinner mentioned this issue Jul 23, 2023

C API: Add internal C API to build a tuple: _PyTupleBuilder python/cpython#107137

Closed

vstinner mentioned this issue Jul 23, 2023

gh-107137: Add _PyTupleBuilder API to the internal C API python/cpython#107139

Closed

vstinner mentioned this issue Aug 14, 2023

gh-107137: Add _PyTuple_NewNoTrack() internal C API python/cpython#107183

Closed

vstinner mentioned this issue Sep 4, 2023

Use of features specific to the C language #35

Open

iritkatriel removed the v label Oct 23, 2023

encukou mentioned this issue Oct 24, 2023

Avoid creating incomplete/invalid objects capi-workgroup/api-evolution#36

Open

encukou added the evolution-proposed label Oct 24, 2023

vstinner closed this as completed Jun 10, 2024

vstinner reopened this Jun 11, 2024

skirpichev mentioned this issue Jun 16, 2024

Add public function PyLong_GetDigits() capi-workgroup/decisions#31

Closed

vstinner mentioned this issue Aug 5, 2024

Add PyBytes_Writer() API capi-workgroup/decisions#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow creation of incomplete/inconsistent objects #56

Disallow creation of incomplete/inconsistent objects #56

vstinner commented Jul 3, 2023

davidhewitt commented Jul 4, 2023 •

edited

Loading

vstinner commented Jul 5, 2023

cavokz commented Jul 5, 2023

davidhewitt commented Jul 5, 2023 •

edited

Loading

vstinner commented Jul 5, 2023

vstinner commented Jul 9, 2023

vstinner commented Jul 11, 2023

erlend-aasland commented Jul 11, 2023

vstinner commented Jul 12, 2023

vstinner commented Jul 23, 2023

vstinner commented Jul 23, 2023

vstinner commented Aug 14, 2023

scoder commented Oct 18, 2023

encukou commented Oct 18, 2023

encukou commented Oct 24, 2023

vstinner commented Jun 10, 2024

cavokz commented Jun 10, 2024

vstinner commented Jun 10, 2024

cavokz commented Jun 10, 2024

vstinner commented Sep 6, 2024

Disallow creation of incomplete/inconsistent objects #56

Disallow creation of incomplete/inconsistent objects #56

Comments

vstinner commented Jul 3, 2023

davidhewitt commented Jul 4, 2023 • edited Loading

vstinner commented Jul 5, 2023

cavokz commented Jul 5, 2023

davidhewitt commented Jul 5, 2023 • edited Loading

vstinner commented Jul 5, 2023

vstinner commented Jul 9, 2023

vstinner commented Jul 11, 2023

erlend-aasland commented Jul 11, 2023

vstinner commented Jul 12, 2023

vstinner commented Jul 23, 2023

vstinner commented Jul 23, 2023

vstinner commented Aug 14, 2023

scoder commented Oct 18, 2023

encukou commented Oct 18, 2023

encukou commented Oct 24, 2023

vstinner commented Jun 10, 2024

cavokz commented Jun 10, 2024

vstinner commented Jun 10, 2024

cavokz commented Jun 10, 2024

vstinner commented Sep 6, 2024

davidhewitt commented Jul 4, 2023 •

edited

Loading

davidhewitt commented Jul 5, 2023 •

edited

Loading