Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce replacement API for PyIter_Next with a non-ambiguous return value #21

Closed
erlend-aasland opened this issue Apr 10, 2024 · 28 comments

Comments

@erlend-aasland
Copy link

Previous discussion:

I would like to pursue this issue. IMO, PyIter_Next having an ambiguous return value is a good enough motivation for adding a replacement API. If you disagree, feel free to just close this issue. AFAICS, three APIs have been discussed; I'm in favour of Mark's proposal (item 3):

1: PyObject *PyIter_NextItem(PyObject* iter, int *err)

Return values: a NULL pointer (loop termination or error) or a PyObject pointer (item returned)
Mentioned in python/cpython#105201 (comment)

Example usage
PyObject *item;
int err;
while (item = PyIter_NextItem(iterator, &err)) {
    /* do something with item */
    ...
    /* release reference when done */
    Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
    /* error */
}
else {
    /* no error */
}

2: int PyIter_NextItem(PyObject* iter, PyObject **item)

Return values: -1 (error) or 0 (item returned or loop termination)
Mentioned in python/cpython#105201 (comment)

Example usage
PyObject *item;
int err;
while ((err = PyIter_NextItem(iterator, &item)) == 0 && *item != NULL) {
  /* do something with item */
  ...
  /* release reference when done */
  Py_DECREF(item);
}
Py_DECREF(iterator);
if (err < 0) {
  /* error */
}
else {
  /* no error */
}

3: int PyIter_NextItem(PyObject *iter, PyObject **next)

Return values: -1 (error), 0 (loop terminated), or 1 (item returned)
Mentioned in python/cpython#105201 (comment)

Example usage
PyObject *next;
int rc;
while ((rc = PyIter_NextItem(iterator, &next)) > 0) {
    use(next);
    Py_DECREF(next);
}
if (rc < 0) {
    /* cleanup */
    return -1;
}
/* done */
@gvanrossum
Copy link

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

@erlend-aasland
Copy link
Author

In this case, isn't it a complicating factor that the underlying tp_ slot, tp_next, has the same ambiguous API? And that's not so easy to change. So this would be mostly cosmetic, right?

I'll quote @markshannon in capi-workgroup/problems#1 (comment):

  • Why is it bad? It is global state, with all the problems that entails
  • Is it slow? Yes. It forces us to save and restore it across calls, and to perform expensive checks after every call into C extension code.
  • Does it prevent optimizations? Yes, for both of the above reasons.

@erlend-aasland
Copy link
Author

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

@gvanrossum
Copy link

The current API (PyIter_Next) requires you to call PyErr_Occurred ("global state") in order to find out if an error happened.

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Well, it saves one call to PyErr_Occurred(), so it should be a little bit faster. But it's still a call to PyErr_Occurred(), which still has all the disadvantages Mark mentions.

@erlend-aasland
Copy link
Author

I get that. But changing PyIter_Next to have a less ambiguous signature just moves the problem -- it still has to call tp_next which itself has a similar ambiguity. So I don't think it addresses any of the three bullets from Mark that you quoted.

Ah, yes that is true. I guess it boils down to a cosmetic issue, for now. I'll let the WG decide if a cleaner API is worth it.

@gvanrossum
Copy link

My personal view is that we have bigger fish to fry.

@encukou
Copy link
Collaborator

encukou commented Apr 11, 2024

+1 for the third one, it matches where we've been going so far and I don't see a reason to change course. But there are some details that make this fish not worth frying right now, like:

  • on -1/0 return, should *next be set to NULL or left as it was?
  • can next itself be NULL (to advance the iterator without getting the value)?

@erlend-aasland
Copy link
Author

  • on -1/0 return, should *next be set to NULL or left as it was?

FWIW: yes, that would make for a saner public API.

  • can next itself be NULL (to advance the iterator without getting the value)?

What's the use-case? I could not find such a use-case in the code base.

@encukou
Copy link
Collaborator

encukou commented Apr 23, 2024

What's the use-case?

Consistency in whether NULL can be passed to an output parameter.

(Deciding whether we want that -- and how to treat output parameters in general -- is one of the proverbial bigger fish to fry.)

@gvanrossum
Copy link

I feel it's better not to allow setting an output parameter's address to NULL. It's a rare case, and it slows down the API function (it has to check whether the pointer is NULL) and may bulk it up (with a DECREF, assuming the output value is produced anyways).

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

@vstinner
Copy link

I like this API: int PyIter_NextItem(PyObject *iter, PyObject **item) (I renamed next to item):

  • Set an exception, set *item to NULL, and return -1 on error.
  • Set *item to NULL, and return 0 on StopIteration.
  • Set *item to a strong reference to the iterator item, and return 1 on success.

iter and item cannot be NULL.

Also, I feel that upon error the output parameter should always be set to NULL, to avoid potentially uninitialized memory.

I agree 100% :-) We already did that in other recently added functions such as PyDict_GetItemRef().

@vstinner
Copy link

IMO it's worth it to propose a better API for PyIter_Next() which has an ambiguous return value (require to check for PyErr_Occurred()).

@erlend-aasland @gvanrossum @zooba: What do you think of the proposed API #21 (comment) ? I understand that @encukou likes it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

@zooba
Copy link

zooba commented May 31, 2024

I like it. It makes this pattern work nicely (due to both errors and valid results coercing to true):

PyObject *iter = PyObject_GetIter(o);
if (!iter) goto error;

PyObject *item = NULL;
while (PyIter_NextItem(iter, &item)) {
    if (!item) goto error;
    // work on item
    Py_DECREF(item);
}
Py_DECREF(iter);

@erlend-aasland
Copy link
Author

What do you think of the proposed API #21 (comment) ?

I like it.

Changing tp_next slot API is not doable. I prefer to leave it as it is.

Yeah; it is unfortunate, but IMO it is not a blocker for providing a better PyIter_Next API.

@vstinner
Copy link

vstinner commented Jun 10, 2024

Ok, let's do a formal vote on the API #21 (comment):

@vstinner
Copy link

See also: capi-workgroup/problems#29

@vstinner
Copy link

@serhiy-storchaka and @mdboom: You can now vote on this API. Just go ahead if you have any question.

@vstinner
Copy link

vstinner commented Jul 3, 2024

@serhiy-storchaka @mdboom: Are you able to check checkboxes to vote?

@vstinner
Copy link

@erlend-aasland: Congrats, your API was approved :-) You can go ahead and implement it! I close this issue.

@erlend-aasland
Copy link
Author

Yay! It's not my API, though. I believe it was proposed by Irit or maybe Mark. I only insisted on the return value :) I'll dig up Irit's implementation and create a PR.

@erlend-aasland
Copy link
Author

There is one issue we did not discuss: the current old API, PyIter_Next, requires the caller to check if iter is an iterator. I think we want the API to check this, set a TypeError and return -1 if iter is not an iterator.

@encukou
Copy link
Collaborator

encukou commented Jul 24, 2024

Definitely.

@picnixz
Copy link

picnixz commented Jul 25, 2024

Would it be possible to introduce a private function which bypasses the check as well (_PyIter_NextItem) (just for CPython internal code just to reduce the number of isinstance() calls). I found those quite convenient when you care about performances in general but I'm not sure if it's the trend to do it in general.

@encukou
Copy link
Collaborator

encukou commented Jul 25, 2024

It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.

(PyIter_Check also checks for _PyObject_NextNotImplemented, but PyIter_Next doesn't have to -- calling it will raise the TypeError.)

The super-optimized way to do this is to call tp->tp_iternext directly.

@picnixz
Copy link

picnixz commented Jul 25, 2024

It's not a full isinstance call, but a NULL check on tp->tp_iternext -- a function pointer we're just about to call. I doubt it would have a measurable impact.

Oh, then yes, it's probably not worth it (I thought it was something more intricate).

@serhiy-storchaka
Copy link

itertools is an example of such super-optimized code. It intentionally uses tp_iternext instead of PyIter_Next to avoid the overhead of the later.

@erlend-aasland
Copy link
Author

itertools is an example of such super-optimized code. It intentionally uses tp_iternext instead of PyIter_Next to avoid the overhead of the later.

It's good that we don't see this in the rest of the standard library; if all extension modules were written like that, we'd have a maintenance nightmare. IMO, this is a good reason to introduce an internal _PyIter_NextItem without the type checking and prune some verbose boiler-plate code in itertools.

@erlend-aasland
Copy link
Author

I created a PR:

I plan on merging it early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants