Return value conventions #13

encukou · 2023-10-09T20:30:32Z

New API functions can return:

An integral value, where -1 is returned if and only if an exception was raised, and non-negative values signal an absence of exception. ~~Negative values other than -1 are never returned.~~ (edit: they are, for hashes -- see below)
A pointer, where NULL is returned if and only if an exception was raised.
(Or PyStatus for pre-initialization functions.)

The return values need to be documented.
Some existing functions don't follow this convention.

If NULL or a negative number are valid outputs, the function needs to reserve the return value for signaling errors, and take an extra *result argument that it fills.

XXX: Is *result set on failure?

"Errors" that you expect to be common, where you want to avoid the overhead of creating an exception object (like the AttributeError from a getattr), aren't treated as failures. In these cases, return 0 for the expected "error", and 1 for complete success.
XXX: is *result set in this case?

The text was updated successfully, but these errors were encountered:

gvanrossum · 2023-10-09T21:03:11Z

@encukou Can you edit this a bit? It looks like some kind of copy/paste error.

Also, I would avoid the loaded term "success", at least for int-returning functions -- for cases like PyDict_GetItemRef(), users might not consider 0 to be a success, but it's not an error. Maybe we can say "absence of exception"?

We might also want to call out that there are families of existing APIs that return 0 for success and 1 for error (following shell conventions), and others that return 0 for error and 1 for success. So we will have to continue documenting the return value in each case. (Less so for functions that return NULL for some success cases -- those are rather exceptional and unless explicitly documented, a NULL always means an error occurred.)

encukou · 2023-10-10T07:39:31Z

Edited, thanks.

IMO, we should always document return values -- at least have a summary with a link. Not everyone reads the section that applies to all of the API.

gvanrossum · 2023-10-10T07:49:39Z

XXX: Is *result set on failure?
[...]
XXX: is *result set in this case?

I vote for always setting the result to something, and document this. If the result is an object, in the error case the result should be NULL. If the result is something else, hopefully there's some useful "neutral" value (e.g. 0 or -1 for integers).

erlend-aasland · 2023-10-10T09:10:21Z

I vote for always setting the result to something, and document this.

See also https://github.com/python/devguide/pull/1128/files

encukou · 2023-10-10T10:38:33Z

That's better API.
I don't think we can count on core devs always remembering to do it though – we'll probably need some kind of mandatory checklist for new API...

vstinner · 2023-10-13T14:47:14Z

See also capi-workgroup/problems#1 "Ambiguous return values".

vstinner · 2023-10-13T14:56:51Z

See also python/devguide#1129 "Improve C API guidelines for return values".

vstinner · 2023-10-13T14:59:06Z

See also python/devguide#1125 "C API: Add guidelines for C APIs with output params".

zooba · 2023-11-13T17:51:37Z

Added a bit more related discussion on #40 (comment) but I think an additional key question is: how do we return both a value and some additional information about where that value came from?

An existing example would be PyUnicode_AsUTF8AndSize, which returns the value and passed back the length of the value in a pointer argument. It return NULL in case of error (what it sets the output argument to in an error case is a separate discussion).

Should we stick to this convention, e.g. char *PyUnicode_AsUTF8AndSize(PyObject *o, Py_ssize_t *len) where result == NULL indicates an exception

Or do we want to switch to only returning exception status and all values go through output parameters, e.g. int PyUnicode_AsUTF8AndSize(PyObject *o, char **str, Py_ssize_t *len)

Or combine the information about the value with the error status and use that as the result, e.g. Py_ssize_t PyUnicode_AsUTF8AndSize(PyObject *o, char **str) where result < 0 indicates an exception

(Note that I'm not proposing changing this particular API, just illustrating how it may have been done under different designs.)

vstinner · 2023-11-13T18:15:14Z

Or do we want to switch to only returning exception status and all values go through output parameters

IMO the API should return directly the value, unless there is an ambiguous case and it's common to have to distinguish them.

Ambiguous cases:

PyDict_GetItem() returns NULL if not found, and returns also NULL if an exception was set: the return value is not enough to distinguish two important cases.
PyLong_AsLong() can return -1 on success, and return -1 if an exception is set.

There is also a desire to avoid the need to call PyErr_Occurred().

Another concern is designing API to not make them error-prone: encourage the author to handle all cases, especially to handle the error case (if an exception is set).

There is another category: "optional parameters". Sometimes, you might want to provide additional output arguments to the caller, but the caller may not use them. In PR python/cpython#112028 I propose int PyDict_Pop(PyObject *p, PyObject *key, PyObject **result) API where result is optional. Sometimes, you want to get the removed value. Sometimes, having to deal with the removed value requires to write additional code which you may prefer to not have to write. See examples of proposed API to see advantages of making result optional.

In the PyUnicode_AsUTF8AndSize() case, IMO size can be seen as optional (and it can be set to NULL): Python has code which calls str=PyUnicode_AsUTF8AndSize(obj, &size) and then ignore size on purpose.

In the proposed PyImport_ImportOrAddModule() API, maybe the information "was the module created?" can be seen as optional, especially because currently the API does not provide the information: it's a new feature.

So there are different flavors of APIs:

(A) Return result directly (no output arguments): the result can be used directly to check if an exception is set, or the function cannot fail, with no ambiguous case. Calling PyErr_Occurred() is not needed.
(B) Return int + output arguments (pointers): when distinguishing "not found" and "found" is common, ex: PyDict_GetItemRef().
(C) Return result + optional output arguments (pointeres): when distinguishing different cases ("is created?") is uncommon. Example: PyDict_Pop(), PyImport_ImportOrAddModule().

The case (A) covers a large part of the C API. Except that some functions require calling PyErr_Occurred() to write correct code (to properly handle errors).

zooba · 2023-11-13T18:51:18Z

Yeah, I largely agree with Victor's comments. Particularly:

Return int + output arguments (pointers): when distinguishing "not found" and "found" is common

I'd frame this as "when distinguishing a valid/expected result from an error". We don't have a useful way to return "not found" as a PyObject * that can be distinguished from NULL and also easily detected in all places it may get passed to, and so we should use a different style here.

maybe the information "was the module created?" can be seen as optional, especially because currently the API does not provide the information: it's a new feature.

I'd argue that it can be seen as optional because the post-condition of the function is that a module exists and was returned. This is met in both cases, whether the module existed or is new, and so whether it was just created is optional.

Contrast this with PyLong_AsLong returning -1 for an error, which is not meeting the post-condition, and isn't distinguishable from a valid case without extra information. We shouldn't allow "was this actually an error" to be optional here (and should probably make the return value be status and the actual number go into an output parameter).

encukou · 2023-11-14T13:54:29Z

I'd prefer using (A) and (C) only -- i.e. switch to output parameters (and returning int status) whenever the return value isn't enough for all information the function provides.
That would mean we don't need to decide what's “common”, and the API would be more regular.

There is another category: "optional parameters". Sometimes, you might want to provide additional output arguments to the caller, but the caller may not use them.

I'd argue that for consistency, we should simply allow all output parameters to be NULL (meaning they aren't filled).

vstinner · 2023-11-14T14:38:22Z

I'd argue that for consistency, we should simply allow all output parameters to be NULL (meaning they aren't filled).

It's kind of surprising to call PyDict_GetItemRef(dict, key, NULL) and not get the requested item, no? Moreover, it makes the coder slower since it has to check if the result is NULL or not.

encukou · 2023-11-14T14:48:36Z

It's kind of surprising to call PyDict_GetItemRef(dict, key, NULL) and not get the requested item, no? Moreover, it makes the coder slower since it has to check if the result is NULL or not.

It's a shortcut for PyDict_Contains. Yes, I'm serious: I don't think we need separate Contains functions.
And I don't think it's that much more surprising than calling PyUnicode_AsUTF8AndSize(s, p, NULL) and not getting the AndSize part.

Moreover, it makes the coder slower since it has to check if the result is NULL or not.

I'd love to see the benchmark behind that claim. Especially with PGO.

vstinner · 2023-11-14T16:10:03Z

I'd love to see the benchmark behind that claim. Especially with PGO.

Sure, it should be measured. @serhiy-storchaka was worried about that in my python/cpython#112028 PR.

serhiy-storchaka · 2023-11-14T18:53:49Z

Or do we want to switch to only returning exception status and all values go through output parameters, e.g. int PyUnicode_AsUTF8AndSize(PyObject *o, char **str, Py_ssize_t *len)

It is exactly the interface of PyBytes_AsStringAndSize() (and its predecessor PyString_AsStringAndSize()). I do not know why the interface of PyUnicode_AsUTF8AndSize() is different. But I can guess that they wanted to implement PyUnicode_AsUTF8() as a macro, and it is the only variant that allows this.

encukou · 2023-11-15T10:53:57Z

(From the proposal in the first post:)

Negative values other than -1 are never returned.

Hash functions return -1 on error, but all other values, including negative ones, are valid.
IMO, that's a good argument for only reserving -1 itself for failure.

vstinner · 2023-11-15T12:26:14Z

Hash functions return -1 on error, but all other values, including negative ones, are valid.

Right, it's part of PyTypeObject.tp_hash API: https://docs.python.org/dev/c-api/typeobj.html#c.PyTypeObject.tp_hash

encukou · 2024-11-29T16:17:21Z

Added to the draft in #53, with a but of wording that might be ambiguous:

When it might be useful for users to call a function but ignore an output, allow passing NULL as the output argument.

That doesn't explicitly say if we want to allow functions like PyDict_GetItemRef(dict, key, NULL) vs. a separate PyDict_Contains function.

There's a current decision issue on the topic: capi-workgroup/decisions#48

serhiy-storchaka · 2024-11-29T16:38:03Z

Supporting NULL as an output argument can be too costly for fast functions, it can hit performance of the common case. Specialized function for checking for existence also can be more efficient. It may be impractical to support this as a strong general rule.

zooba · 2024-12-02T12:59:25Z

On the other hand, specialized functions for cases where "I don't care about this value" is perfectly clear, leads to API bloat and makes it harder to figure out how to write the correct code (time-of-check-time-of-use-etc.). We also end up maintaining more code on the implementation side - a null check before writing an output is one (short) line of code, whereas an entirely separate exported function is going to touch 2-3 files minimum.

I'm also skeptical that "indirected write" vs "null check and indirected write" is going to come out clearly in favour of either path. I'm sure either case can be proven with microbenchmarks, but I really can't see it mattering that much in the overall scheme of things. Especially since we should be doing null checks on output argument pointers anyway.

Macros and inline functions will optimise away literal null arguments just fine. I'd prefer to continue relying on macros and inline functions for fast functions, and optimise for code size and maintainability for exported ones.

encukou mentioned this issue Oct 10, 2023

Reference handling #16

Open

encukou added the guideline To be included in guidelines PEP label Oct 11, 2023

encukou mentioned this issue Oct 12, 2023

Ambiguous return values capi-workgroup/problems#1

Open

This was referenced Oct 18, 2023

Provide PyLong conversions with Py_(u)intptr_t python/cpython#110863

Open

Infallible functions capi-workgroup/problems#20

Open

Inconsistency in success/failure return values capi-workgroup/problems#25

Open

Functions must not suppress exceptions #35

Open

zooba mentioned this issue Nov 13, 2023

gh-106915: Add PyImport_ImportOrAddModule() function python/cpython#111559

Closed

vstinner mentioned this issue Nov 14, 2023

Provide a variant of PyDict_SetDefault that returns a new reference (instead of a borrowed reference) python/cpython#112066

Closed

vstinner mentioned this issue Nov 16, 2023

Functions which cannot fail #43

Closed

zooba mentioned this issue Dec 4, 2023

[PEP 741] gh-107954: Add PyConfig_Get() function python/cpython#112609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return value conventions #13

Return value conventions #13

encukou commented Oct 9, 2023 •

edited

Loading

gvanrossum commented Oct 9, 2023 •

edited

Loading

encukou commented Oct 10, 2023

gvanrossum commented Oct 10, 2023

erlend-aasland commented Oct 10, 2023

encukou commented Oct 10, 2023

vstinner commented Oct 13, 2023

vstinner commented Oct 13, 2023

vstinner commented Oct 13, 2023

zooba commented Nov 13, 2023 •

edited

Loading

vstinner commented Nov 13, 2023

zooba commented Nov 13, 2023

encukou commented Nov 14, 2023

vstinner commented Nov 14, 2023

encukou commented Nov 14, 2023

vstinner commented Nov 14, 2023 •

edited

Loading

serhiy-storchaka commented Nov 14, 2023

encukou commented Nov 15, 2023

vstinner commented Nov 15, 2023

encukou commented Nov 29, 2024

serhiy-storchaka commented Nov 29, 2024

zooba commented Dec 2, 2024

Return value conventions #13

Return value conventions #13

Comments

encukou commented Oct 9, 2023 • edited Loading

gvanrossum commented Oct 9, 2023 • edited Loading

encukou commented Oct 10, 2023

gvanrossum commented Oct 10, 2023

erlend-aasland commented Oct 10, 2023

encukou commented Oct 10, 2023

vstinner commented Oct 13, 2023

vstinner commented Oct 13, 2023

vstinner commented Oct 13, 2023

zooba commented Nov 13, 2023 • edited Loading

vstinner commented Nov 13, 2023

zooba commented Nov 13, 2023

encukou commented Nov 14, 2023

vstinner commented Nov 14, 2023

encukou commented Nov 14, 2023

vstinner commented Nov 14, 2023 • edited Loading

serhiy-storchaka commented Nov 14, 2023

encukou commented Nov 15, 2023

vstinner commented Nov 15, 2023

encukou commented Nov 29, 2024

serhiy-storchaka commented Nov 29, 2024

zooba commented Dec 2, 2024

encukou commented Oct 9, 2023 •

edited

Loading

gvanrossum commented Oct 9, 2023 •

edited

Loading

zooba commented Nov 13, 2023 •

edited

Loading

vstinner commented Nov 14, 2023 •

edited

Loading