Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-105922: PyImport_AddModule() uses Py_DECREF() #105998

Closed
wants to merge 3 commits into from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Jun 22, 2023

Rewrite PyImport_AddModule() to simply call Py_DECREF(), rather than creating a weak reference, to get a borrowed reference to the module.


📚 Documentation preview 📚: https://cpython-previews--105998.org.readthedocs.build/

@vstinner
Copy link
Member Author

@serhiy-storchaka: In gh-86160, you added this code using PyWeakref_New() + PyWeakref_GET_OBJECT() get a borrowed reference, whereas the module is known to be stored in the sys.modules dictionary. What's the rationale for using a weak reference, rather than just using Py_DECREF()?

Rewrite PyImport_AddModule() to simply call Py_DECREF(), rather than
creating a weak reference,  to get a borrowed reference to the
module.
@vstinner
Copy link
Member Author

I reverted the doc change.

@vstinner
Copy link
Member Author

Oh, ./python -m test -j1 -R 3:3 test_import leaks references ("test_import leaked [11, 13, 10] references, sum=34")... but it also leaks in the main branch. Ah! It's unrelated to this PR. Also, ./python -m test -R 3:3 test_import doesn't leak. It's inteesting :-)

@serhiy-storchaka
Copy link
Member

sys.modules can be not a dict, and it is not guaranteed that it keeps reference. It can be a custom mapping with __getitem__ which returns a new object or removes returned object or __setitem__ which does nothing. There is so much unknown. With weakref we can be sure that we get either a reference to live object or NULL. Please do not change this code until you find better solution.

@vstinner
Copy link
Member Author

Also, ./python -m test -j1 -R 3:3 test_import does crash, but again, it's unrelated to this PR. It does also crash in the main branch. It's a recent regression: #91095 (comment)

@vstinner vstinner marked this pull request as draft June 22, 2023 22:38
@vstinner
Copy link
Member Author

PyImport_AddModule() history, oldest to newest:

  • In 1990, commit 85a5fbb (first commit which added C code to CPython, so oldest C code tracked by the Git repository) added new_module() function which uses mtab = sysget("modules"); to get modules. sysget() is implemented as dictlookup(sysdict, name) which looks into static object *sysdict;. This version respected sys.modules :-)

  • In 1990, commit 3f5da24 added static object *modules; variable to Python/import.c, variable used by add_module() (old name of PyImport_AddModule()). sys.modules is no longer used by add_modules().

  • In 1997, commit 25ce566 added PyImport_GetModuleDict() implemented as PyThreadState_Get()->interp. PyImport_AddModules() uses PyObject *modules = PyImport_GetModuleDict(); and then works on this dictionary with PyDict_GetItemString() and PyDict_SetItemString().

  • In 2017, commit 3f9eee6 modified import_add_module() to "Support other mappings in PyInterpreterState.modules": issue [subinterpreters] Eliminate PyInterpreterState.modules. #72597.

  • In 2021, commit 4db8988 changes PyImport_AddModule() to use PyWeakref_New() to create a borrowed reference.

  • In 2023, commit b2fc549 changes Python/import.c to use a new MODULES(interp) macro which gets interp->imports.modules.

@vstinner
Copy link
Member Author

In 2017, commit 3f9eee6 modified import_add_module() to "Support other mappings in PyInterpreterState.modules": issue GH-72597.

I'm not convinced that GH-72597 was fully implemented. At least, PyImport_AddModule() doesn't support overriding sys.modules at runtime since 1990... so since the creation of Python :-)

I added a test to my PR to check this special case: custom sys.modules which doesn't hold a strong reference to the newly created module.

According to this test, PyImport_AddModule() always uses tstate->interp->imports.modules directly, it doesn't take in account that sys.modules was overriden. Moreover, setting sys.modules at runtime doesn't update this internal tstate->interp->imports.modules member: the sys module has no custom __setattr__() function to update it.

Calling PyImport_AddModule() writes into the "original" modules dict, even if sys.modules was overriden.

@vstinner
Copy link
Member Author

In the main branch, there is a single line which sets PyInterpreter.imports.modules: MODULES(interp) = PyDict_New(); of _PyImport_InitModules(). I don't see any C API, even internal C API, to changes this PyInterpreterState member. And the public C API PyImport_GetModuleDict() just returns this member.

It means that all C code using directly PyInterpreter.imports.modules can only get a dict, not a custom type.

Obviously, C code and Python code getting sys.modules attribute can get any type, not only dict, if sys.modules is overriden. For example, the _pickle extension (implemented in C) gets the sys.modules attribute and then... uses the PyDict C API, oops. It is likely to crash if sys.modules type is not dict or a subclass of dict.

@vstinner
Copy link
Member Author

Well, my goal here was to get rid of the PyWeakref_GET_OBJECT() since I would like to deprecate it. So I wrote a different change: PR #106001 uses _PyWeakref_GET_REF().

But maybe this whole code was always useless and so can be removed, since it seems impossible to get a type other than dict in this code path.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Py_REFCNT any better than PyWeakref_GetObject? I preferred to use later because the former is a macro and more tightly bound to internal representation.

@serhiy-storchaka
Copy link
Member

I have great doubts that it was worth to add support of non-dict as sys.modules. It complicated a code a lot, it is not supported everywhere in the C code, and causes issues with borrowed references, like in this code.

@vstinner
Copy link
Member Author

I have great doubts that it was worth to add support of non-dict as sys.modules. It complicated a code a lot, it is not supported everywhere in the C code, and causes issues with borrowed references, like in this code.

The problem is that modules have no __setattr__() method which could be used to prevent setting sys.modules to a non-sense value like sys.modules = 123 or sys.modules = ["my", "list"]. Maybe some developers made the wrong assumption that internal C functions really access sys.modules attributes, whereas the implementation access directly a PyInterpreterState member and ignores sys.modules.

@vstinner
Copy link
Member Author

I created issue #106016 "Support defining setattr() and delattr() in a module". It would allow to either disallow setting sys.modules to a type different than dict, or at least to update PyInterpreterState.imports.modules reference.

@@ -373,16 +373,20 @@ PyImport_AddModuleObject(PyObject *name)
return NULL;
}

// gh-86160: PyImport_AddModuleObject() returns a borrowed reference
PyObject *ref = PyWeakref_NewRef(mod, NULL);
// Convert to a borrowed reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Convert to a borrowed reference
// Convert to a borrowed reference.

@vstinner
Copy link
Member Author

I fixed the issue differently: commit ee52158

I don't think that PyImport_AddModuleObject() requires this complicated PyWeakref dance, since the code always gets a strong reference from a Python dict: it's not possible to override PyInterpreterState.imports.modules, and this function access directly PyInterpreterState.imports.modules. But I don't feel the need to change the code. I propose to continue the discussion in the issue #106016.

@vstinner vstinner closed this Jun 26, 2023
@vstinner vstinner deleted the add_module_obj branch June 26, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants