Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return value conventions #13

Open
encukou opened this issue Oct 9, 2023 · 21 comments
Open

Return value conventions #13

encukou opened this issue Oct 9, 2023 · 21 comments
Labels
guideline To be included in guidelines PEP

Comments

@encukou
Copy link
Contributor

encukou commented Oct 9, 2023

New API functions can return:

  • An integral value, where -1 is returned if and only if an exception was raised, and non-negative values signal an absence of exception. Negative values other than -1 are never returned. (edit: they are, for hashes -- see below)
  • A pointer, where NULL is returned if and only if an exception was raised.
  • (Or PyStatus for pre-initialization functions.)

The return values need to be documented.
Some existing functions don't follow this convention.

If NULL or a negative number are valid outputs, the function needs to reserve the return value for signaling errors, and take an extra *result argument that it fills.

XXX: Is *result set on failure?

"Errors" that you expect to be common, where you want to avoid the overhead of creating an exception object (like the AttributeError from a getattr), aren't treated as failures. In these cases, return 0 for the expected "error", and 1 for complete success.
XXX: is *result set in this case?

@gvanrossum
Copy link
Contributor

gvanrossum commented Oct 9, 2023

@encukou Can you edit this a bit? It looks like some kind of copy/paste error.

Also, I would avoid the loaded term "success", at least for int-returning functions -- for cases like PyDict_GetItemRef(), users might not consider 0 to be a success, but it's not an error. Maybe we can say "absence of exception"?

We might also want to call out that there are families of existing APIs that return 0 for success and 1 for error (following shell conventions), and others that return 0 for error and 1 for success. So we will have to continue documenting the return value in each case. (Less so for functions that return NULL for some success cases -- those are rather exceptional and unless explicitly documented, a NULL always means an error occurred.)

@encukou
Copy link
Contributor Author

encukou commented Oct 10, 2023

Edited, thanks.

IMO, we should always document return values -- at least have a summary with a link. Not everyone reads the section that applies to all of the API.

@gvanrossum
Copy link
Contributor

XXX: Is *result set on failure?
[...]
XXX: is *result set in this case?

I vote for always setting the result to something, and document this. If the result is an object, in the error case the result should be NULL. If the result is something else, hopefully there's some useful "neutral" value (e.g. 0 or -1 for integers).

@erlend-aasland
Copy link
Contributor

I vote for always setting the result to something, and document this.

See also https://github.com/python/devguide/pull/1128/files

@encukou
Copy link
Contributor Author

encukou commented Oct 10, 2023

That's better API.
I don't think we can count on core devs always remembering to do it though – we'll probably need some kind of mandatory checklist for new API...

@encukou encukou added the guideline To be included in guidelines PEP label Oct 11, 2023
@vstinner
Copy link
Contributor

See also capi-workgroup/problems#1 "Ambiguous return values".

@vstinner
Copy link
Contributor

See also python/devguide#1129 "Improve C API guidelines for return values".

@vstinner
Copy link
Contributor

See also python/devguide#1125 "C API: Add guidelines for C APIs with output params".

@zooba
Copy link
Contributor

zooba commented Nov 13, 2023

Added a bit more related discussion on #40 (comment) but I think an additional key question is: how do we return both a value and some additional information about where that value came from?

An existing example would be PyUnicode_AsUTF8AndSize, which returns the value and passed back the length of the value in a pointer argument. It return NULL in case of error (what it sets the output argument to in an error case is a separate discussion).

Should we stick to this convention, e.g. char *PyUnicode_AsUTF8AndSize(PyObject *o, Py_ssize_t *len) where result == NULL indicates an exception

Or do we want to switch to only returning exception status and all values go through output parameters, e.g. int PyUnicode_AsUTF8AndSize(PyObject *o, char **str, Py_ssize_t *len)

Or combine the information about the value with the error status and use that as the result, e.g. Py_ssize_t PyUnicode_AsUTF8AndSize(PyObject *o, char **str) where result < 0 indicates an exception

(Note that I'm not proposing changing this particular API, just illustrating how it may have been done under different designs.)

@vstinner
Copy link
Contributor

Or do we want to switch to only returning exception status and all values go through output parameters

IMO the API should return directly the value, unless there is an ambiguous case and it's common to have to distinguish them.

Ambiguous cases:

  • PyDict_GetItem() returns NULL if not found, and returns also NULL if an exception was set: the return value is not enough to distinguish two important cases.
  • PyLong_AsLong() can return -1 on success, and return -1 if an exception is set.

There is also a desire to avoid the need to call PyErr_Occurred().

Another concern is designing API to not make them error-prone: encourage the author to handle all cases, especially to handle the error case (if an exception is set).

There is another category: "optional parameters". Sometimes, you might want to provide additional output arguments to the caller, but the caller may not use them. In PR python/cpython#112028 I propose int PyDict_Pop(PyObject *p, PyObject *key, PyObject **result) API where result is optional. Sometimes, you want to get the removed value. Sometimes, having to deal with the removed value requires to write additional code which you may prefer to not have to write. See examples of proposed API to see advantages of making result optional.

In the PyUnicode_AsUTF8AndSize() case, IMO size can be seen as optional (and it can be set to NULL): Python has code which calls str=PyUnicode_AsUTF8AndSize(obj, &size) and then ignore size on purpose.

In the proposed PyImport_ImportOrAddModule() API, maybe the information "was the module created?" can be seen as optional, especially because currently the API does not provide the information: it's a new feature.

So there are different flavors of APIs:

  • (A) Return result directly (no output arguments): the result can be used directly to check if an exception is set, or the function cannot fail, with no ambiguous case. Calling PyErr_Occurred() is not needed.
  • (B) Return int + output arguments (pointers): when distinguishing "not found" and "found" is common, ex: PyDict_GetItemRef().
  • (C) Return result + optional output arguments (pointeres): when distinguishing different cases ("is created?") is uncommon. Example: PyDict_Pop(), PyImport_ImportOrAddModule().

The case (A) covers a large part of the C API. Except that some functions require calling PyErr_Occurred() to write correct code (to properly handle errors).

@zooba
Copy link
Contributor

zooba commented Nov 13, 2023

Yeah, I largely agree with Victor's comments. Particularly:

  • Return int + output arguments (pointers): when distinguishing "not found" and "found" is common

I'd frame this as "when distinguishing a valid/expected result from an error". We don't have a useful way to return "not found" as a PyObject * that can be distinguished from NULL and also easily detected in all places it may get passed to, and so we should use a different style here.

maybe the information "was the module created?" can be seen as optional, especially because currently the API does not provide the information: it's a new feature.

I'd argue that it can be seen as optional because the post-condition of the function is that a module exists and was returned. This is met in both cases, whether the module existed or is new, and so whether it was just created is optional.

Contrast this with PyLong_AsLong returning -1 for an error, which is not meeting the post-condition, and isn't distinguishable from a valid case without extra information. We shouldn't allow "was this actually an error" to be optional here (and should probably make the return value be status and the actual number go into an output parameter).

@encukou
Copy link
Contributor Author

encukou commented Nov 14, 2023

I'd prefer using (A) and (C) only -- i.e. switch to output parameters (and returning int status) whenever the return value isn't enough for all information the function provides.
That would mean we don't need to decide what's “common”, and the API would be more regular.

There is another category: "optional parameters". Sometimes, you might want to provide additional output arguments to the caller, but the caller may not use them.

I'd argue that for consistency, we should simply allow all output parameters to be NULL (meaning they aren't filled).

@vstinner
Copy link
Contributor

I'd argue that for consistency, we should simply allow all output parameters to be NULL (meaning they aren't filled).

It's kind of surprising to call PyDict_GetItemRef(dict, key, NULL) and not get the requested item, no? Moreover, it makes the coder slower since it has to check if the result is NULL or not.

@encukou
Copy link
Contributor Author

encukou commented Nov 14, 2023

It's kind of surprising to call PyDict_GetItemRef(dict, key, NULL) and not get the requested item, no? Moreover, it makes the coder slower since it has to check if the result is NULL or not.

It's a shortcut for PyDict_Contains. Yes, I'm serious: I don't think we need separate Contains functions.
And I don't think it's that much more surprising than calling PyUnicode_AsUTF8AndSize(s, p, NULL) and not getting the AndSize part.

Moreover, it makes the coder slower since it has to check if the result is NULL or not.

I'd love to see the benchmark behind that claim. Especially with PGO.

@vstinner
Copy link
Contributor

vstinner commented Nov 14, 2023

I'd love to see the benchmark behind that claim. Especially with PGO.

Sure, it should be measured. @serhiy-storchaka was worried about that in my python/cpython#112028 PR.

@serhiy-storchaka
Copy link

Or do we want to switch to only returning exception status and all values go through output parameters, e.g. int PyUnicode_AsUTF8AndSize(PyObject *o, char **str, Py_ssize_t *len)

It is exactly the interface of PyBytes_AsStringAndSize() (and its predecessor PyString_AsStringAndSize()). I do not know why the interface of PyUnicode_AsUTF8AndSize() is different. But I can guess that they wanted to implement PyUnicode_AsUTF8() as a macro, and it is the only variant that allows this.

@encukou
Copy link
Contributor Author

encukou commented Nov 15, 2023

(From the proposal in the first post:)

Negative values other than -1 are never returned.

Hash functions return -1 on error, but all other values, including negative ones, are valid.
IMO, that's a good argument for only reserving -1 itself for failure.

@vstinner
Copy link
Contributor

Hash functions return -1 on error, but all other values, including negative ones, are valid.

Right, it's part of PyTypeObject.tp_hash API: https://docs.python.org/dev/c-api/typeobj.html#c.PyTypeObject.tp_hash

@encukou
Copy link
Contributor Author

encukou commented Nov 29, 2024

Added to the draft in #53, with a but of wording that might be ambiguous:

When it might be useful for users to call a function but ignore an output, allow passing NULL as the output argument.

That doesn't explicitly say if we want to allow functions like PyDict_GetItemRef(dict, key, NULL) vs. a separate PyDict_Contains function.

There's a current decision issue on the topic: capi-workgroup/decisions#48

@serhiy-storchaka
Copy link

Supporting NULL as an output argument can be too costly for fast functions, it can hit performance of the common case. Specialized function for checking for existence also can be more efficient. It may be impractical to support this as a strong general rule.

@zooba
Copy link
Contributor

zooba commented Dec 2, 2024

On the other hand, specialized functions for cases where "I don't care about this value" is perfectly clear, leads to API bloat and makes it harder to figure out how to write the correct code (time-of-check-time-of-use-etc.). We also end up maintaining more code on the implementation side - a null check before writing an output is one (short) line of code, whereas an entirely separate exported function is going to touch 2-3 files minimum.

I'm also skeptical that "indirected write" vs "null check and indirected write" is going to come out clearly in favour of either path. I'm sure either case can be proven with microbenchmarks, but I really can't see it mattering that much in the overall scheme of things. Especially since we should be doing null checks on output argument pointers anyway.

Macros and inline functions will optimise away literal null arguments just fine. I'd prefer to continue relying on macros and inline functions for fast functions, and optimise for code size and maintainability for exported ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guideline To be included in guidelines PEP
Projects
None yet
Development

No branches or pull requests

6 participants