Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Make Py_TYPE(obj) outlive obj #38

Open
encukou opened this issue Nov 7, 2023 · 11 comments
Open

Idea: Make Py_TYPE(obj) outlive obj #38

encukou opened this issue Nov 7, 2023 · 11 comments

Comments

@encukou
Copy link
Contributor

encukou commented Nov 7, 2023

Py_TYPE(obj) currently returns a borrowed reference, which is only valid until:

  • obj is GC'd, or
  • the type of obj changes

Changing a type is possible from Python, using assignment to __class__. That makes Py_TYPE(obj) unsafe, especially with the GIL removed.
Py_TYPE is ubiquitous, we can't really get rid of it (or change its semantics).
However, changing the type is a rare, specialized operation. Perhaps we can penalize it to make Py_TYPE safe.
How would this sound?

  • When an object's type is changed from A to B, then B takes a reference to A, which is only freed when B itself is freed.

Now, PyTYPE(obj) is guaranteed to outlive obj, at the cost of:

  • keeping old types around
  • extra bookkeeping (when an object's type is changed)
  • a lazily allocated mapping of “old types” (either per-type, stored with B with id(A) keys, or per-interpreter with (id(A), id(B)) keys)

... which seems reasonable for __class__ assignment.

@encukou encukou changed the title Make Py_TYPE(obj) outlive obj Idea: Make Py_TYPE(obj) outlive obj Nov 7, 2023
@pitrou
Copy link

pitrou commented Nov 23, 2023

I would suggest an other approach:

  1. review existing usage of Py_TYPE to look for the most common use cases
  2. add safe APIs dedicated to these use cases
  3. introduce a new Py_TypeRef inline function returning a new reference to replace remaining uses of Py_TYPE

As for 1., my intuition is that most use cases for Py_TYPE involve either type checking or formatting error messages.

@encukou
Copy link
Contributor Author

encukou commented Nov 23, 2023

That's also possible. In fact, it doesn't conflict with my proposal. The trouble is with the implied next step:

  1. remove Py_TYPE (somehow -- e.g. with a deprecation period, for nogil builds only, ...)

My intuition agrees with yours, but another common use would be for looking up slots.

@vstinner
Copy link
Contributor

vstinner commented Dec 1, 2023

introduce a new Py_TypeRef inline function returning a new reference to replace remaining uses of Py_TYPE

The existing PyObject_Type() function returns a new reference to the type.

remove Py_TYPE (somehow -- e.g. with a deprecation period, for nogil builds only, ...)

No all Py_TYPE() usage are unsafe. In the majority of cases, it's very unlikely to get a crash related to Py_TYPE() borrowed reference. Specially design object can trigger crashes, yes. But with regular code, it's unlikely. I don't think that it's worth it to deprecate/remove Py_TYPE().

@davidhewitt
Copy link

davidhewitt commented Dec 2, 2023

Is it possible to delay deletion of type objects until the garbage collector runs? I think in this case Py_TYPE(obj) returning a borrowed reference might be at worst racy in a nogil world for short-lived reads. If callers need to store the result of Py_TYPE(obj) I think it's clear they should already be using Py_NewRef or similar.

@vstinner
Copy link
Contributor

vstinner commented Dec 6, 2023

Is it possible to delay deletion of type objects until the garbage collector runs?

Since Python 3.12, Py_DECREF() no longer deletes an object immediately. The GC is now triggered by the ceval loop: between two opcodes, but not "in the middle" of arbitrary C code.

The problem is that if a function uses Py_TYPE() borrowed reference and calls arbitary Python code, the Python code can trigger a GC collection. If the Python code clears the last reference to the type, the borrowed reference becomes a dangling pointer, and the code is likely to crash.

Example of such crash in PEP 737 – Unify type name formatting: https://peps.python.org/pep-0737/#use-t-format-with-py-type-pass-a-type

@pitrou
Copy link

pitrou commented Dec 6, 2023

Since Python 3.12, Py_DECREF() no longer deletes an object immediately. The GC is now triggered by the ceval loop: between two opcodes, but not "in the middle" of arbitrary C code.

I'm trying to understand this sentence: is reference counting deferred to the eval loop, or is it just the cyclic garbage collector?

I would definitely expect Py_DECREF to delete an object immediately if its refcount falls to zero, especially as the eval loop might be suspended for a long time if in the middle of executing a heavy C/C++ workload.

@davidhewitt
Copy link

The problem is that if a function uses Py_TYPE() borrowed reference and calls arbitary Python code, the Python code can trigger a GC collection. If the Python code clears the last reference to the type, the borrowed reference becomes a dangling pointer, and the code is likely to crash.

Right, and this is essentially what I'm thinking by the difference between a "short-lived read" and "need to store the result", though I did not word it well.

My point is that in nogil world arbitrary python code can be running all the time on other threads. I think the best you can hope is that when you call into arbitrary python code the stop-the-world GC might halt you so that it can run (but I'm not too sure exactly on the process of this yet). So this is why I ask if it makes sense to delay type object deletion until the GC.

I'm trying to understand this sentence: is reference counting deferred to the eval loop, or is it just the cyclic garbage collector?

Me too :)

@vstinner
Copy link
Contributor

vstinner commented Dec 7, 2023

@pitrou:

I'm trying to understand this sentence: is reference counting deferred to the eval loop, or is it just the cyclic garbage collector?

What's New In Python 3.12 says (highlight is mine):

The Garbage Collector now runs only on the eval breaker mechanism of the Python bytecode evaluation loop instead of object allocations. The GC can also run when PyErr_CheckSignals() is called so C extensions that need to run for a long time without executing any Python code also have a chance to execute the GC periodically. (Contributed by Pablo Galindo in gh-97922.)

I would definitely expect Py_DECREF to delete an object immediately if its refcount falls to zero

Creating a heap type creates multiple reference cycles. For example, a type contains itself in its internal MRO tuple. And each method contains a strong reference to the type (which contains the methods). A garbage collection is needed to delete a heap type.

@davidhewitt
Copy link

Ah I see, so what I'm asking for is already true thanks to the cycles.

@vstinner
Copy link
Contributor

vstinner commented Dec 8, 2023

Ah I see, so what I'm asking for is already true thanks to the cycles.

In Python 3.11, when a object tracked by the GC is created (by _PyObject_GC_New() or _PyObject_GC_NewVar()), it can trigger a GC collection. If code "around" Py_TYPE() borrowed reference creates such object, the borrwed reference can become a dangling Python.

In Python 3.12, this case can happen if the code "around" the borrowed reference calls arbitrary code which can trigger a GC collection, such as "arbitrary Python code". For example, if an object is deallocaed and it's finalizer function is implemented as a Python __del__() method.

Well, in general, it's simpler to just say: borrowed references are dangerous :-)

@gvanrossum
Copy link
Contributor

Since Python 3.12, Py_DECREF() no longer deletes an object immediately. The GC is now triggered by the ceval loop: between two opcodes, but not "in the middle" of arbitrary C code.

This may already have been cleared up, but for the record, this is not true. We would like it to be true, but it isn't. What's delayed is calls to the cyclical GC, which used to be called directly by allocations. But Py_DECREF() still decrements ob_refcnt and calls _Py_Dealloc() when the count is zero, and that function calls the tp_dealloc for the object's type. (If the object being DECREF'ed is a type, the rest of the thread explains why this can't reach zero -- types involve refence cycles. But the statement above about Py_DECREF() is incorrect.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants