Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming convention for new C API functions #52

Closed
vstinner opened this issue Jun 23, 2023 · 43 comments
Closed

Naming convention for new C API functions #52

vstinner opened this issue Jun 23, 2023 · 43 comments

Comments

@vstinner
Copy link
Contributor

vstinner commented Jun 23, 2023

Are you ok with the naming convention of adding Ref suffix when adding a variant of a function which returns a strong reference instead of a borrowed reference? Do we need to go further in terms of standard?

See also issue #14: "PyLong and PyUnicode don't match the Python type names".

Point raised by @markshannon in my PR adding PyDict_GetItemRef().

(*) I would like to fix the C API to avoid functions returning borrewed references. For that, I add new variants of existing functions which return strong references, by adding a Ref suffix to their name:

  • PyModule_AddObject() => PyModule_AddObjectRef()
  • PyImport_AddModule() => PyImport_AddModuleRef()
  • PyWeakref_GetObject() => PyWeakref_GetRef()
  • PyDict_GetItemWithError() => PyDict_GetItemRef() -- it also replaces PyDict_GetItem()

In the past, other naming convention were used (other suffixes were added).

(*) To add support for Unicode filenames in the import machinery, I add variants of functions taking char* by adding Object suffix: function taking PyObject* argument. Examples:

  • PyRun_SimpleFileExFlags() => PyRun_SimpleFileObject()
  • PyRun_InteractiveOneFlags() => PyRun_InteractiveOneFlagsObject()
  • Py_CompileStringExFlags() => Py_CompileStringObject()
  • PyErr_WarnExplicit() = PyErr_WarnExplicitObject()
  • PyErr_SyntaxLocationEx() => PyErr_SyntaxLocationObject()

(*) Previously, another trend was simply to add Ex suffix. Windows API has a similar pattern: WaitForMultipleObjects() => WaitForMultipleObjectsEx(). Examples in Python:

  • PyErr_Print() => PyErr_PrintEx()
  • Py_Finalize() => Py_FinalizeEx()
  • PyErr_SyntaxLocationObject() => PyErr_SyntaxLocationObjectEx() -- this one now combines two suffix, Object and Ex :-)

(*) A fourth trend (!) is to use WithSomething suffix. Examples:

  • PyDict_GetItem() => PyDict_GetItemWithError()
  • PyImport_ExecCodeModuleEx() => PyImport_ExecCodeModuleWithPathnames()
  • PyCFunction => PyCFunctionWithKeywords
  • PyType_FromSpec() => PyType_FromSpecWithBases()

(*) I also saw another suffix, I don't know if it can be called a convention, a variant of the previous name: add Flags.

  • PyRun_AnyFileEx() => PyRun_AnyFileExFlags() -- combo! Ex and Flags suffixes :-)
  • PyRun_String() => PyRun_StringFlags()
  • _PyStructSequence_InitBuiltin() => _PyStructSequence_InitBuiltinWithFlags -- variant of the variant... With + Flags

So I saw the following conventions to add suffixes (oldest to the most recent):

  • Add Flags suffix
  • Add Ex suffix
  • Add WithSomething suffix
  • Add Object suffix
  • Add Ref suffix

The funny part is that some added functions get multiple suffixes like PyRun_AnyFileExFlags() (Ex + Flags).

@markshannon
Copy link

markshannon commented Jun 23, 2023

Given this for new API functions why not have a new prefix to clearly distinguish new API functions from old?

E.g.

PyAPI_DictGetItem()
PyCAPI_DictGetItem()
Python_DictGetItem()
PyStable_DictGetItem() /* To distinguish from the unstable API */

@vstinner
Copy link
Contributor Author

A side question of this issue is: How to shrink the size of the C API? See the issue #53 for that.

@vstinner
Copy link
Contributor Author

Given this for new API functions why not have a new prefix to clearly distinguish new API functions from old?

Over time, some functions accumulated many variants (versions):

  • Version 1: PyRun_AnyFile()
  • Version 2: PyRun_AnyFileFlags() and PyRun_AnyFileEx() -- yes, two for the price of one!
  • Version 3: PyRun_AnyFileExFlags() (sic)
  • Version 4 (private): _PyRun_AnyFileObject()

Total: 5 flavors of PyRun_AnyFile(). All of them are still available in Python 3.13!

Adding a prefix or a suffix doesn't make a big difference: at the end, we will always need to add a new version in future to fix yet another issue. Someone will come up with a new use case which requires to change an existing API which means adding yet another flavor.

So do we prefer accumulating suffixes, prefixes, or both? So far, the trend was to always keep the name short Py prefix (PyXXX or Py_XXX names) and only add suffixes.

@markshannon
Copy link

PyRun_XXX is one case where we do want a struct in the API.
Structs can be used as input to general, extensible functions, allowing named arguments in C.
E.g.

struct PythonRunConfig {
    int version; /* Necessary, so the API function doesn't read off the end */
    FILE *fp;
    const char *filename; 
    int closeit,
    PyCompilerFlags *flags; 
    /* Extra fields go here */
};

PyAPI_FUNC(int) Python_Run(const PythonRunConfig *config);

Used as follows:

PythonRunConfig config =
(PythonRunConfig) {
    .fp = the_file,
    .closeit = 1
};

int err = Python_Run(&config);

@vstinner
Copy link
Contributor Author

vstinner commented Jun 26, 2023

IMO changing the suffix should be a one time event and the whole legacy API would still be supported forever anyway: #54 Changing the suffix is an opportunity to reuse existing names for the "sane" behavior.

Even if the suffix is changed once, we will still have to add new variants tomorrow. Do you have to agree on a suffix?

@vstinner
Copy link
Contributor Author

@markshannon @erlend-aasland: are you ok with the Ref suffix convention for new functions returning a strong reference rather than a borrowed reference?

cc @serhiy-storchaka

@vstinner
Copy link
Contributor Author

PyRun_XXX is one case where we do want a struct in the API.

The struct API sounds appealing. But that would be a 6th version of the same function and so a new name should be found: completely different or add yet another suffix.

@erlend-aasland
Copy link

Data point: SQLite simply use a version suffix, increasing the version number for each API revision. Example:

  • sqlite3_prepare
  • sqlite3_prepare_v2
  • sqlite3_prepare_v3

@erlend-aasland
Copy link

@markshannon @erlend-aasland: are you ok with the Ref suffix convention for new functions returning a strong reference rather than a borrowed reference?

cc @serhiy-storchaka

What if we have to amend one of the *Ref() functions say in five years time from now. What do we call the new, amended API? *RefEx()? :)

I kind of like the version convention of SQLite; you never know how many variants of an API you'll need, and naming them similar only creates more confusion (in my opinion). For example PySomething_Fetch, PySomething_Get vs. PySomething_GetRef. It is not obvious which one is the newest? If they are named PySomething_Get, PySomething_Get_v2, and PySomething_Get_v3 it would be obvious which one is newest.¨

If _GetItem is the perfect name for an API that need amending, why not simply attach _v2 to it for the amended variant?

@erlend-aasland
Copy link

I think this issue is drifting to a discussion about possible solutions; suggesting to create an issue in the devguide, and continue discussing solutions (that is: guidelines) over there.

@erlend-aasland
Copy link

I created python/devguide#1126 for discussion solutions.

@vstinner
Copy link
Contributor Author

What if we have to amend one of the *Ref() functions say in five years time from now. What do we call the new, amended API? *RefEx()? :)

My experience says that yes, for sure, we will have to add a variant of at least one Ref function.

In that case, I suggest coming with a new suffix and replace Ref with that suffix.

@vstinner
Copy link
Contributor Author

naming them similar only creates more confusion (in my opinion). For example PySomething_Fetch, PySomething_Get vs. PySomething_GetRef. It is not obvious which one is the newest?

In think that Windows not only has Ex suffix but also Ex2. Python naming convention has more diversity in suffixes so it's harder to know which one is the "latest". One issue is that all variants are kept. Usually it's because the new API has a new feature only useful to a minority of users, so most users are fine with the existing API and there is no strong willingness to deprecate/remove old APIs and force a migration to the new ones.

The Ref variant is different since it fix the API which was "error prone". The Ref is safer. But. Borrow references are fine in most cases when you use CPython. Even PyPy is fine to return a borrow reference to an item of a container: the pointer remains valid until the container is destroyed. Still, I'm not very motivated to enforce a major migration from PyDict_GetItem() to PyDict_GetItemRef() just to reduce the risk of an unlikely issue.

@vstinner
Copy link
Contributor Author

If they are named PySomething_Get, PySomething_Get_v2, and PySomething_Get_v3 it would be obvious which one is newest.

An alternative is to just add a number: PyDict_GetItem(), PyDict_GetItem2(), PyDict_GetItem3(), ...

I dislike mixing SnakeCase with lowercase "v2".

@vstinner
Copy link
Contributor Author

The Linux kernel just adds a number suffix to syscalls:

  • dup(), dup2(), dup3() adds flags parameter
  • opentat(), openat2()
  • io_uring_enter(), io_uring_enter2()
  • epoll_create(), epoll_create1() -- ooops! 😬
  • pipe(), pipe2()
  • renameat(), renameat2()
  • mlock(), mlock2()
  • etc.

A competitor calling convention exists for Linux syscalls: number of arguments:

  • accept(), accept4() adds a 4th parameter
  • signalfd, signalfd4() adds a 4th flags parameter

There are also 64 bits variants like getdents64().

Linux syscalls: https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md

@erlend-aasland
Copy link

Yeah, just sticking a number to it is not clear enough, IMO. As you say, there are precedent for using number to imply the number of params, and stuff like 64-bit variants. If you add a v or V, there is a stronger indicator this is a new version of a specific API :)

(Perhaps we should take this discussion to the "solutions" repo (python/devguide)

@vstinner
Copy link
Contributor Author

vstinner commented Jun 26, 2023

By the way in my list of Python C API suffix, I forgot the number suffix. Some examples:

  • PyModule_Create(), PyModule_Create2()
  • PyModule_FromDefAndSpec(), PyModule_FromDefAndSpec2()
  • PyStructSequence_InitType(), PyStructSequence_InitType2()

@vstinner
Copy link
Contributor Author

Ok, can we now agree on the versioning of C API calling conventions? 😂

  • v1: no suffix
  • v2: add Flags suffix (ex: PyRun_StringFlags())
  • v3: Add Ex suffix (ex: Py_FinalizeEx2())
  • v3.5: Add FlagsSuffix suffix 😬
  • v4: Add number suffix (ex: PyModule_Create2())
  • v5: Add WithSomething suffix (ex: PyDict_GetItemWithError()(
  • v6: Add Object suffix (ex: PyErr_WarnExplicitObject())
  • v7: Add Ref suffix (ex: PyModule_AddObjectRef()))
  • v8: The new hype is the old hype, the number suffix strikes back (ex: PyDict_GetItem2())

@erlend-aasland
Copy link

Given this for new API functions why not have a new prefix to clearly distinguish new API functions from old?

It ends up being a variant of the same dilemma; if you need to amend the new API, you'll have to introduce yet another prefix.

@markshannon
Copy link

I get the need for version numbers, but given we are overhauling the whole API, wouldn't it make sense to distinguish the new, more principled API from the old, ad-hoc one?
Hopefully we won't need too many _v2 functions.

@vstinner
Copy link
Contributor Author

I think that we are talking past each others here. I see two questions:

  • In an existing API, how can we have a consistent naming convention when adding a new function
  • Introduce a new prefix to have a brand new API which follows the new guidelines and so only prefer "sane" APIs (unambigious return value, correct error reporting, no borrowed references, etc.).

I prefer to keep Py prefix for now. In short, I propose adding PyDict_GetItem2() which replaces PyDict_GetItem().

When tomorrow, a new API will be designed, PyDict_GetItem2() will become PyCAPI_PyDict_GetItem() (no suffix): replacing removed PyDict_GetItem() function.

In such new API, we will still have to add new functions, we can use the same naming convention (being discussed here).


Let's say that all new C API functions must now use a new PyCAPI_ prefix. Slowly, we will have a mixture of Py_ and PyCAPI_ functions in C extensions. For me, changing the prefix sounds "let's write a new API from scratch and make it correct this time" (such as HPy). But that's not what being discussed here. The question here is how to name new functions for an incremental change of the existing C API.

For a brand new C API, IMO it's more the domain of the issue #54 and I consider that they are unrelated.

If the prefix is changed, I would prefer that the whole C API is provided with this prefix. For example, I would feel bad to have to call PyLong_AsLong() after PyAPI_Dict_GetItem(). I would prefer to have PyCAPI_Long_AsLong() included.

The problem is that someone has to go through the 1500+ functions of the C API, check if API follows the new guideline, and if yes, introduce an alias:

#define PyCAPI_Long_AsLong PyLong_AsLong

hint: PyLong_AsLong() doesn't respect the guideline, -1 is ambigious :-)

@vstinner
Copy link
Contributor Author

Introduce a new prefix to have a brand new API which follows the new guidelines and so only prefer "sane" APIs (unambigious return value, correct error reporting, no borrowed references, etc.).

To clarify the discussion, I created issue capi-workgroup/api-revolution#9 to discuss this idea.

@gpshead
Copy link

gpshead commented Jun 26, 2023

fyi - pep-703 currently spells the _GetItem replacement as _FetchItem for a non-borrowing API.

The Linux kernel just adds a number suffix to syscalls:

ISTR that in many cases the trailing number was also the number of expected arguments or a similar calling convention notation. But that may have been more by chance (each new API naturally added one?) rather than design. This OS pattern started long before Linux. dup -> dup2 comes from BSD.

Regardless, with no clear thing to pick here and nobody wanting to think their "this time it's perfect" API is going to wind up revised in the coming decades... I don't think it matters which naming scheme we pick or even if it winds up wholly consistent. The more important aspect is documentation making it clear which APIs are ideal, old-ideal / reluctantly-supported, and please-never-use-this-legacy status. With the second most important aspect the ability for someone to request only a certain modern version of the API be available via their #include. (similar to how Py_LIMITED_API works?) so their code can't give into wrong API temptations.

@vstinner
Copy link
Contributor Author

@gpshead:

I don't think it matters which naming scheme we pick or even if it winds up wholly consistent.

Ok, but so far, nobody approved my PR python/cpython#106005 and @markshannon asked to get a consensus on naming convention before adding the function. So, are you ok with PyDict_GetItemRef() name? What about the others?

@vstinner
Copy link
Contributor Author

The more important aspect is documentation making it clear which APIs are ideal, old-ideal / reluctantly-supported, and please-never-use-this-legacy status.

I created python/peps#3182 proposing to formalize the concept of "soft deprecation". It may help to make old functions kept for backward compatibility without the willingness to schedule their removals, but they should not be used for new code.

With the second most important aspect the ability for someone to request only a certain modern version of the API be available via their #include. (similar to how Py_LIMITED_API works?) so their code can't give into wrong API temptations.

I proposed issue #54 for that. See also the more radical issue capi-workgroup/api-revolution#9 idea.

@markshannon
Copy link

@gpshead I agree that an API we come up with isn't going to perfect, but it will be a lot better than what we have now.
A new API that learns from our experience and is designed carefully will be a lot more consistent, usable and stable than the current ad-hoc API.

Sure, we need to allow for further changes, but version 2s should be rare. The C API should reflect the semantics of the language and features of the VM. While these get added to, they change very rarely.

For example, the semantic of getting an item from a dictionary haven't changed, it is just that we are fixing flaws in the current API which doesn't handle errors properly. We have learned that just because a function can't currently fail, we should allow for failure in the API, with a few exceptions: (e.g. PyLong_Check() cannot fail).

Having a clear marker (I prefer a new prefix) in the API that differentiates the old from the new, makes it easy to see which API functions may need updating.

@vstinner
Copy link
Contributor Author

@iritkatriel added _PyErr_ChainExceptions1(exc) to Python 3.12: function replacing _PyErr_ChainExceptions(exc_type, exc_value, exc_tb). Is the name related to the number of parameters: 3 => 1? Should it be renamed to ``_PyErr_ChainExceptions2()` before 3.12 final is released?

@vstinner
Copy link
Contributor Author

fyi - pep-703 currently spells the _GetItem replacement as _FetchItem for a non-borrowing API.

I'm not a fan of changing a function name when fixing an issue to avoid adding a suffix. IMO it makes the migration from existing API to the new one more complicated. For me, it's more obvious that PyModule_AddModuleRef() is a replacement for PyModule_AddModule(), and the fact that it has a suffix means that it's the newer API, and there should be there for a reason. If it would be called differently, like PyModule_AppendModule() or PyModule_ModuleAdd(), I would be more confused on what is the old API and what is the new one, which one should I pick?

The Ref suffix is a hint that: if you care about avoiding borrowed references, you should check for the list of all new Ref variants and use them in your code.

Obviously, if we change the suffix instead, the hint would be even stronger! If a function with new suffix exists, obviously it's a new API and it should be used. But I dislike the idea of changing the suffix since it smells too much like "designing a branch new API" and it's whole new can of worms: see issue capi-workgroup/api-revolution#9. Just one example, in my PR python/cpython#106005 I was asked to change PyObject* parameter type to PyDictObject*. It would make sense in a brand new API. It belongs less an "an incremental fix" of the existing C API.

Moreover, in a "brand new API", soon we will have for sure to deprecate API and add new ones. Again, the question of adding a suffix / using a different function name arises. IMO it would feel weird to have to change the suffix each time that we add a variant of a function.

@iritkatriel
Copy link
Member

@iritkatriel added _PyErr_ChainExceptions1(exc) to Python 3.12: function replacing _PyErr_ChainExceptions(exc_type, exc_value, exc_tb). Is the name related to the number of parameters: 3 => 1? Should it be renamed to ``_PyErr_ChainExceptions2()` before 3.12 final is released?

This is an internal function, so I didn't put too much thought into it. I think we should create a non-internal version at some point.

@vstinner
Copy link
Contributor Author

At the end, sorry, but I failed to get the preferences of all people involved in this discussion. Do you prefer to switch to the number suffix for new APIs? Do you prefer to keep the status quo: add a name suffix (Ex, Flags, Ref, Object, etc.: the chosen suffix depends on the change)?

Currently, the vast majority of new functions use a "name" suffix: only 3 functions use a number suffix (2 suffix for the 2nd version): #52 (comment) The status quo is on the name suffix approach.

@serhiy-storchaka
Copy link

serhiy-storchaka commented Jul 2, 2023

Plese note that adding a suffix means that function has almost the same signature (with possible changing the type of few parameters or returning type or adding new parameters). I think that using names which are only different by a suffix but have completely different signatures would be confusing. Especially if other functions with the same suffix have only only difference (or even only semantic difference) from the original functions. See python/cpython#106005. For functions with radically different signature we should use different naming scheme.

@vstinner
Copy link
Contributor Author

vstinner commented Sep 4, 2023

I created PR #108870 to rename the private _PyThreadState_UncheckedGet() to a new PyThreadState_GetUnsafe() function. Before, the name had Unchecked but not as a suffix. I propose to add Unsafe suffix instead.

Python 3.13 has two functions with an Unchecked suffix:

  • _PyFrame_PushUnchecked()
  • _PyFrame_PushTrampolineUnchecked()

And one function with Unsafe suffix:

  • _PyGILState_GetInterpreterStateUnsafe()

The following functions are also documented as being Unsafe in a comment:

  • _PyDict_GetItemWithError()
  • _PyUnicode_FastFill()
  • _PyUnicode_FastCopyCharacters()

Note: I chose the Fast marker in these PyUnicode function names, and it was a bad idea :-) Nothing proves that it's faster than the public function. It's like METH_FASTCALL, how long before another faster method will happen? :-)

@serhiy-storchaka
Copy link

Note: I chose the Fast marker in these PyUnicode function names, and it was a bad idea :-) Nothing provides that it's faster than the public function. It's like METH_FASTCALL, how long before another faster method will happen? :-)

At least it is better than METH_NEWCALL.

@vstinner
Copy link
Contributor Author

vstinner commented Sep 9, 2023

@serhiy-storchaka @erlend-aasland @encukou @markshannon: Do you have a preference between Unsafe and Unchecked suffix? Question related to python/cpython#108870 addition.

@gvanrossum
Copy link

The words "unchecked" and "unsafe" have different meanings and connotations. In the issue you link to, which is about getting the current thread state or returning NULL if there is no current threadstate (instead of calling Py_FatalError), I think there's nothing unsafe about the function -- it is just a different API. So I don't think it should be renamed to "Unsafe".

@vstinner
Copy link
Contributor Author

vstinner commented Sep 9, 2023

PyThreadState_Get() and PyInterpreterState_Get() are safe to use, since they are garanteed to be non-NULL. That's what I mean by "safe" vs "unsafe". Well, not sure if calling Py_FatalError() makes it really "safe" since this function aborts the process 😁

Do you have a better suggestion for the name?

@gvanrossum
Copy link

Stick with Unchecked. Just remove the underscore and be done.

@davidhewitt
Copy link

davidhewitt commented Sep 9, 2023

In Rust an unsafe function usually means the caller needs to uphold invariants which are documented rather than checked by the compiler. (Failing to do so typically leads to memory corruption.) If I understand correctly there aren't specific invariants for a PyThreadState_Get() alternative which can return NULL, so GetUnsafe also doesn't seem right to me (as someone normally reading Rust code).

PyThreadState_GetUnchecked seems to describe the difference between the two functions, so Unchecked would get my vote.

Well, not sure if calling Py_FatalError() makes it really "safe" since this function aborts the process 😁

With Rust's interpretation of unsafe, calling Py_FatalError() could definitely be a way to make an API safe by aborting when invariants are not correctly upheld. Aborting is safe :)

EDIT: Guido beat me to the reply while I was drafting this comment 😂

@vstinner
Copy link
Contributor Author

I would prefer the PyThreadState_GetUnchecked() name proposed by @davidhewitt, instead of the PyThreadState_UncheckedGet() name proposed by @gvanrossum.

I like adding suffix: when functions are sorted in their documentation, it's easier to spot the variants. Otherwise, they may be sorted in a "random" order in the doc (unless the alphabetical order is not respected on purpose). Also, IMO it's easier to discover function variants when a suffix is added* to the end of the existing name.

I think that I now get the different between "unsafe" and "unchecked". For this function, I agree that "unchecked" is a good fit.

@encukou
Copy link
Contributor

encukou commented Sep 11, 2023

Aborting is safe

As you mentioned, Rust's unsafe has a very specific meaning -- invariants normally checked by the Rust compiler aren't checked. And so, anything that compiler doesn't normally check for you is considered “safe”: aborting the process, memory leaks, deadlocks, infinite loops, and so on.
IMO, this definition of safe/unsafe doesn't make any sense outside Rust. It's fine if Python uses it for something else.


As for PyThreadState_*Get*: if we want Unsafe/Unchecked to mean “danger, read the docs!”, IMO it's fine to use it here.
If we want a term specifically for “returns NULL without setting an exception”, I don't think either Unsafe or Unchecked are it.

@vstinner
Copy link
Contributor Author

FYI I updated my PR python/cpython#108870 to rename the function to PyThreadState_GetUnchecked(). So far, nobody proposed a better name. I'm fine with this name.

@encukou
Copy link
Contributor

encukou commented Oct 24, 2023

Proposed guideline issues:

@vstinner
Copy link
Contributor Author

Let's continue the discussion in the api-evolution project, in issues listed in the previous comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants