-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PyBytes_Writer() API #39
Comments
So, my 2 cent after Victor explained to me how it's being used: it's good and helpful.. I have one question though. Is there a preference of having PyBytesWriter_Create(Py_ssize_t size, char **str); instead of PyBytesWriter_Create(char **str, Py_ssize_t size); ? Since I'm on a phone, it's hard to check the files but shouldn't we mimic the usual order of having the "output" variable on the left? (for instance the Prepare method also puts the str before its length). |
Ah? For me, the trend in the Python C API is more to put output parameters at the end ("right"). |
Ah? Er... actually it's probably the case for the C API now that I think about with all the Getters functions. I think I used the mem/alloc C functions as the baseline and thought as the Create as something close to the realloc() function. I don't really have a strong opinion on the matter though so let's keep your signature. |
We probably should have a public discussion on Discourse before getting to the WG decision. AFAIK, there are two main use cases for a factory like this:
For the first use case, IMO an efficient implementation should mean that For the second, why not use
It seems to me that The private API has an optimization that we lose here: it puts a small initial buffer on the stack. It seems that performance-minded users will want to use their own implementation anyway. In CPython itself, there are quite few uses of |
It's slower since you have to copy bytearray memory to create a bytes object, and destroy the bytearray, to create the final bytes object. PyBytesWriter works in the nanoseconds area.
Correct, the code is not thread safe, and expected to have a single user.
PyBytesWriter contains a temporary bytes object. The purpose of the API is to only expose the bytes object when it's fully initialized.
The small buffer is also used in the proposed public API. It's just that it's allocated on the heap memory.
The problem of using a buffer and then calling PyBytes_FromStringAndSize() is that you have to copy the memory, and then destroy the temporary buffer. Again, PyBytesWriter uses a bytes object internally, so calling PyBytesWriter_Finish() is cheap. |
A more concrete micro-benchmark shows that the public PyBytesWriter (23.4 ns) is 4x faster than bytearray (93.0 ns): python/cpython#121726 (comment). |
Thank you, I see now! Your explanations help with reading the implementation. A hard part of making this public will be documenting what |
When you call When you call str is valid until PyBytesWriter_Finish() or PyBytesWriter_Discard() is called. Would you prefer that these functions set str to |
That, and the
Probably not necessary, they don't clear |
Correct. |
Except you can call I'd expect that each |
I find the |
Oh. Can you propose a better API in this case?
|
For naming,
It will be the less confusing :) |
For some brainstorming -- I would expect that
That said, this is still a weird API; The proposed API allows user code to keep track of the “current write position” using Why not let the user handle the “current write position” on their own, as an offset from “start of buffer” that's handled by the writer, and have for example: // Create a bytes writer instance.
// Set `*str` to a writable buffer of at least the given `size`.
// `*str` is valid until the next `PyBytesWriter_*` call on the result.
PyBytesWriter* PyBytesWriter_Create(Py_ssize_t size, char **str);
// Reallocate to `size` bytes.
// Set `*str` to a writable buffer of at least the given `size`;
// `min(<old_size>, size)` bytes are copied to this buffer.
// `*str` is valid until the next `PyBytesWriter_*` call on this `writer`.
// (Detail: this will never shrink the internal buffer, but if it needs to grow,
// it will not overallocate. The user can choose an overallocation strategy
// themself.)
int PyBytesWriter_Realloc(PyBytesWriter *writer, Py_ssize_t size, char **str);
// Return the final Python bytes object with the given size,
// and destroy the writer instance.
// `size` must not be greater than in the previous `_Create`/`_Realloc` call.
PyObject* PyBytesWriter_Finish(PyBytesWriter *writer, Py_ssize_t size);
// Discard the internal bytes buffer and destroy the writer instance.
void PyBytesWriter_Discard(PyBytesWriter *writer); That would make the convenience helper slightly more complicated, though, // Write `data_size` bytes from `data` at the given `*offset`.
// Set `*str` to a writable buffer of at least `*offset + data_size` bytes;
// `*offset` bytes are copied to this buffer.
// Increment `*offset` by `data_size`.
// Equivalent to:
// - PyBytesWriter_Realloc(writer, *offset + data_size, str)
// - memcpy((*str) + offset, data, data_size)
// - *offset += data_size
// (Detail: unlike PyBytesWriter_Realloc, this *will* overallocate, so
// creating bytes using `PyBytesWriter_Write` would be amortized `O(N)`.)
void PyBytesWriter_Write(PyBytesWriter *writer, char **str, Py_ssize_t *offset, Py_ssize_t data_size, char *data); But, looking at the big picture, I see only two use cases:
|
Petr:
I agree. I worry that the complexity of this API will lead to misuse. Victor [regarding my complaints about the "prepare" API]:
I don't know yet. I can try to think up something. |
It seems like this API is too low-level and too error-prone. I prefer to abandon promoting this API as a public API for now. We can revisit this API later if needed. |
This API is basically an efficient memory manager for functions creating Python bytes objects.
It avoids the creation of incomplete/inconsistent Python bytes objects: allocate an object with uninitialized bytes, fill the object, resize it. See issues:
API:
The str parameter is what's being used to write bytes in the writer.
Example creating the string "abc":
The "hot code" which writes bytes can only use str.
The
PyBytesWriter_WriteBytes()
is just a small convenient helper function (forPyBytesWriter_Prepare()
+memcpy()
calls).You can find more documentation and examples in the Pull Request.
This proposed public API is based on existing
_PyBytesWriter
API which is quite old. I wrote an article about it in 2016: https://vstinner.github.io/pybyteswriter.htmlThe text was updated successfully, but these errors were encountered: