Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 757: Add import-export API for Python int objects #35

Closed
vstinner opened this issue Jul 14, 2024 · 51 comments
Closed

PEP 757: Add import-export API for Python int objects #35

vstinner opened this issue Jul 14, 2024 · 51 comments
Labels

Comments

@vstinner
Copy link

vstinner commented Jul 14, 2024

See PEP 757 – C API to import-export Python integers.

@skirpichev
Copy link

To play with:

Later I'll show some simple benchmarks, but first please note that current code for gmpy2 do special cases for "small" integers to make performance acceptable. I think it's a common pattern and should be used in the second case as well (current code in the CPython main has a special case for 1-digit ints).

That's why I would prefer more simple and less generic API, like proposed in #31. E.g. for reading:

if (PyLong_IsCompact(obj)) { /* should be a quick check */
    mpz_set_si(z, PyLong_CompactValue(obj));  /* compact value fits in some integer type */
}
else {
    /* fallback to "array of digits" view for abs(obj) */
    const Py_digit *digits = PyLong_GetDigits(obj);  /* can't fail for non-compact ints */
    mpz_import(z, ..., digits);
    if (PyLong_IsNegative(obj) { /* looks better than PyLong_GetSign() ;) */
        mpz_neg(z, z);
    } 
}

(In this example, for simplicity, I assume that compact value fits in long.)

@skirpichev
Copy link

Benchmarks (one, two, ~ten, ~hundred digits) for gmpy2.

For export

On the master:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 233 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 295 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 491 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.33 us +- 0.01 us

With new API:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 37 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 315 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 508 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.34 us +- 0.01 us

Without special case for 1-digit:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 291 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 316 ns +- 1 ns

Using PyUnstable_Long_IsCompact (like in proposed above code):

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 215 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 317 ns +- 2 ns

New API overhead seems to be noticeable for 1-2 digits case (up to 10-15%).

For import

On the master:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<7)' 'int(x)'
Mean +- std dev: 111 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<38)' 'int(x)'
Mean +- std dev: 155 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<300)' 'int(x)'
Mean +- std dev: 473 ns +- 6 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<3000)' 'int(x)'
Mean +- std dev: 2.20 us +- 0.00 us

With new API:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<7)' 'int(x)'
Mean +- std dev: 111 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<38)' 'int(x)'
Mean +- std dev: 155 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<300)' 'int(x)'
Mean +- std dev: 512 ns +- 5 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=mpz(1<<3000)' 'int(x)'
Mean +- std dev: 2.25 us +- 0.02 us

@zooba
Copy link

zooba commented Jul 15, 2024

If PyLong_GetDigits returned the layout information with its call, then it could return ndigits=sizeof(ssize_t); bits_per_digit=8; digits=&compact_value for a compact value, and you shouldn't need the additional checks. The layout could be a pointer to a static struct easily enough, so it's not costly to return, though I prefer the stability of returning a pointer that requires a "free" call of some sort.

Otherwise, if we were to document this function as pairing with PyUnstable_Long_IsCompact then it probably should also be unstable. But I think it's better to leave that optimisation unspecified, if it's not needed for correctness.

PyAPI_DATA(const PyLongLayout) PyLong_LAYOUT;

Please, no more exported data. Make it a function that either returns a pointer to the layout, a function that copies the layout values into a struct, or integrate it with GetDigits as I suggested so that we can support multiple layouts.

// Array endian:
// * 1 for most significant byte first (big endian)
// * -1 for least significant first (little endian)
int8_t array_endian;

Does this meant "most significant digit first"? Or "most significant byte of each digit first"? Or both?

As we'd now use Py_digit in the ABI, is there a need to also provide its size via a function? Or can we rely on callers understanding how their C ABI works? (Or can we drop Py_digit from the ABI and just work with void * instead? In which case we must provide the digit size some other way.)

@skirpichev
Copy link

Does this meant "most significant digit first"? Or "most significant byte of each digit first"? Or both?

No, it's about endianness in the digit. We have word_endian to represent the order of digits.

Probably naming could be better. E.g. endian instead of array_endian and/or digits_order instead of word_endian. It seems, CPython consistently uses term "digit" instead of "word" (as libmpdec) or "limb" (as GMP).

Or can we drop Py_digit from the ABI and just work with void * instead?

That seems to be a good idea. PyLongLayout has digit_size field.

@encukou
Copy link
Collaborator

encukou commented Jul 19, 2024

I agree with @zooba, and I still stand by my earlier comment. I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints.

@vstinner
Copy link
Author

Do you mean adding a layout member to PyLong_DigitArray (export) and a layout parameter to PyLongWriter_Create() (import)?

typedef struct PyLong_DigitArray {
    ...
    const PyLongLayout *layout;  // <== NEW
} PyLong_DigitArray;

PyAPI_FUNC(int) PyLong_AsDigitArray(
    PyObject *obj,
    PyLong_DigitArray *array);

PyAPI_FUNC(PyLongWriter*) PyLongWriter_Create(
    ...
    const PyLongLayout *layout);  // <== NEW

@skirpichev
Copy link

I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints.

JFR, I don't think this locks CPython in any way: this single layout is an asymptotic one (for "big" ints). The API has enough freedom to emulate this view for small integers.

Suggestion @zooba might work with API like (reading case):

typedef struct PyLongLayout {
    uint8_t bits_per_digit;
    uint8_t digit_sizeof;
    int8_t digits_order;
    int8_t digit_endianness;
    Py_ssize_t ndigits;
} PyLongLayout;

/* Returns a pointer to digit array (or NULL, if obj not PyLongObject) and
   set layout.  Which might be different wrt bits_per_digit and digit_sizeof for
   "small" and "big" ints. */
PyAPI_FUNC(const void *) PyLong_AsDigits(PyObject *obj, PyLongLayout *layout);

@vstinner
Copy link
Author

The API has enough freedom to emulate this view for small integers.

Currently, exporting a small integer as an array of digits is a O(1) operation. There is no need to allocate memory or anything, all integers are implemented internally as an array of digits.

@encukou
Copy link
Collaborator

encukou commented Jul 25, 2024

OK. Let's design API for what we have. If we need additional layouts, that kind of API can be added in the future, and if/when we add it, the API added here will start copying data. And if a non-CPython implementation uses a more elaborate scheme, it'll need to copy data too.
I could live with that. Let's design it that way then.

PyLongLayout should be named PyLong_DigitArrayLayout, since the actual internal layout might change in the future.

Same with the writer: PyLong_DigitArrayWriter_Create & PyLong_DigitArrayWriter_Finish.

We also need a PyLong_DigitArrayWriter_Discard. Finish is trivial now, but if it can do a memory copy in the future, users should call Discard in error paths.

We don't need endianness in the struct. I think something like a new medium-int representation (which would need a new API), is much more likely than switching to non-native endianness.
(Endian flags make sense if you want to serialize data, and can ask for a particular format, but the in-memory format can just always be native.)


As for avoiding Py_DATA, I filed capi-workgroup/problems#80.
I'm OK with a PyLong_GetDigitArrayLayout function rather than a PyLong_LAYOUT constant (or maybe in addition to a PyLong_DIGIT_ARRAY_LAYOUT constant?) but it's not clear to me why it's needed.


@skirpichev said:

I would prefer more simple and less generic API, like proposed in #31. E.g. for reading:

 if (PyLong_IsCompact(obj)) { /* should be a quick check */
     mpz_set_si(z, PyLong_CompactValue(obj));  /* compact value fits in some integer type */
 }
 else {
     /* fallback to "array of digits" view for abs(obj) */
     const Py_digit *digits = PyLong_GetDigits(obj);  /* can't fail for non-compact ints */
     mpz_import(z, ..., digits);
     if (PyLong_IsNegative(obj) { /* looks better than PyLong_GetSign() ;) */
         mpz_neg(z, z);
     } 
 }

Would it work if you replace PyLong_IsCompact and PyLong_CompactValue with PyLong_AsNativeBytes and a digit_size buffer? That will do extra work for long ints, but the amount of extra work might be small enough (and offset by only calling one function).
If it's not fast enough, maybe we should add a Py_ASNATIVEBYTES_COMPACT_ONLY flag, which would make PyLong_AsNativeBytes leave buffer untouched and return 0 when you should use PyLong_AsDigitArray.


Looking at the above example: most callers will need the sign bit, so PyLong_AsDigitArray should return 1 for negative numbers.
(Or it could get a int *sign output argument like with PyLong_GetSign, but distinguishing 0 from positives doesn't sound too useful here.)

@skirpichev
Copy link

We don't need endianness in the struct.

You meant word_endian too?

Would it work if you replace PyLong_IsCompact and PyLong_CompactValue with PyLong_AsNativeBytes and a digit_size buffer?

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'  # huh
Mean +- std dev: 350 ns +- 2 ns
patch wrt gmpy2 test branch
diff --git a/src/gmpy2_convert_gmp.c b/src/gmpy2_convert_gmp.c
index 34f2546..e732080 100644
--- a/src/gmpy2_convert_gmp.c
+++ b/src/gmpy2_convert_gmp.c
@@ -44,24 +44,25 @@
 static void
 mpz_set_PyLong(mpz_t z, PyObject *obj)
 {
-    static PyLong_DigitArray long_export;
-
-    PyLong_AsDigitArray(obj, &long_export);
-    if (long_export.ndigits == 1) {
-        mpz_set_si(z, long_export.digits[0]);
+    long value;
+    Py_ssize_t bytes = PyLong_AsNativeBytes(obj, &value, sizeof(long), -1);
+    if (0 <= bytes && bytes <= (Py_ssize_t)sizeof(long)) {
+        mpz_set_si(z, value);
     }
     else {
+        static PyLong_DigitArray long_export;
+        PyLong_AsDigitArray(obj, &long_export);
         mpz_import(z, long_export.ndigits,
                    PyLong_LAYOUT.array_endian,
                    PyLong_LAYOUT.digit_size,
                    PyLong_LAYOUT.word_endian,
                    PyLong_LAYOUT.digit_size*8 - PyLong_LAYOUT.bits_per_digit,
                    long_export.digits);
+        if (long_export.negative) {
+            mpz_neg(z, z);
+        }
+        PyLong_FreeDigitArray(&long_export);
     }
-    if (long_export.negative) {
-        mpz_neg(z, z);
-    }
-    PyLong_FreeDigitArray(&long_export);
 }
 
 static MPZ_Object *

Looking at the above example: most callers will need the sign bit

Not necessary. This view rather represents the absolute value.

@encukou
Copy link
Collaborator

encukou commented Jul 25, 2024

You meant word_endian too?

Yes. I see no reason to single that out as a detail that might change.

Mean +- std dev: 263 ns +- 2 n
Mean +- std dev: 350 ns +- 2 ns

Hm, I guess that's not good enough?
Does the Py_ASNATIVEBYTES_COMPACT_ONLY idea sound reasonable?

most callers will need the sign bit

I meant that instead of

        int result = PyLong_AsDigitArray(obj, &long_export);
        if (result < 0) goto error;
        mpz_import(...);
        if (long_export.negative) {
            mpz_neg(z, z);
        }
        PyLong_FreeDigitArray(&long_export);

you'd do

        int sign = PyLong_AsDigitArray(obj, &long_export);
        if (sign < 0) goto error;
        mpz_import(...);
        if (sign) {
            mpz_neg(z, z);
        }
        PyLong_FreeDigitArray(&long_export);

@skirpichev
Copy link

Hm, I guess that's not good enough?

Yes, especially the second case. You can see other benchmarks above.

Does the Py_ASNATIVEBYTES_COMPACT_ONLY idea sound reasonable?

Yes, if PyLong_AsNativeBytes can do a quick exit - that should be an alternative to PyLong_IsCompact/CompactValue.

int sign = PyLong_AsDigitArray(obj, &long_export);

This looks as a minor convenience for me. I think to query (optionally) sign people should use dedicated functions (like PyLong_IsNegative()).

@encukou
Copy link
Collaborator

encukou commented Jul 25, 2024

Oh, I forgot we already have PyUnstable_Long_IsCompact exposed. That's the check a Py_ASNATIVEBYTES_COMPACT_ONLY would do in the current implementation.

If you're chasing nanoseconds, I don't think you want to compile for stable ABI, and that you will want to change your code if the internals change. So, PyUnstable looks OK.

(That said, consider having the optimizations in #ifndef Py_LIMITED_API, and releasing stable-ABI wheels alongside the version-specific fast ones. Testers of CPython alpha releases would thank you, and hopefully, eventually, so would PyPy users.)

@vstinner
Copy link
Author

vstinner commented Aug 6, 2024

@skirpichev:

Probably naming could be better. E.g. endian instead of array_endian and/or digits_order instead of word_endian. It seems, CPython consistently uses term "digit" instead of "word" (as libmpdec) or "limb" (as GMP).

I renamed word_endian to digits_order and array_endian to endian.

@encukou:

I agree with @zooba, and I still stand by my #31 (comment). I don't want to lock CPython -- and any future C API implementors -- into using a single layout for all ints.

Ok, I modified the API to add layout parameters to import and export APIs.

  • Replace PyLong_LAYOUT constant with PyLong_GetNativeLayout() function.
  • Add layout parameter to PyLongWriter_Create().
  • Add layout member to PyLong_DigitArray structure.

@vstinner
Copy link
Author

vstinner commented Aug 6, 2024

@skirpichev:

This looks as a minor convenience for me. I think to query (optionally) sign people should use dedicated functions (like PyLong_IsNegative()).

The function PyLong_IsNegative() doesn't exist. Or maybe you're thinking at a different function?

@skirpichev
Copy link

The function PyLong_IsNegative() doesn't exist.

Yes, but see #29

@zooba
Copy link

zooba commented Aug 6, 2024

Mean +- std dev: 263 ns +- 2 n
Mean +- std dev: 350 ns +- 2 ns

Hm, I guess that's not good enough?

If I'm reading the earlier benchmarks correctly, then we'd have to beat 111ns and 155ns for these, which is clearly only going to be possible by leaking access to Long internals. So it's fine for when using unstable APIs, and the slower path is probably fine for stable callers (it's still faster than failing ;) ). The difference is largely unavoidable function call overhead, a type check, and the (potentially unaligned) memcpy. I daresay we could add a faster path through AsNativeBytes for an aligned destination of sizeof(Py_ssize_t), but it's still unlikely to approach the speed of the macros.

But that's okay, blazing speed is what the unstable API is for, and those who need it can at least opt into using it at their own cost.

I do think that anything returning somewhat raw digits from a LongObject should also return the sign. One call to get all the info to accurately recreate the value, and you only need additional checks if you're creating your own fast path.

@vstinner
Copy link
Author

vstinner commented Sep 3, 2024

(Or can we drop Py_digit from the ABI and just work with void * instead? In which case we must provide the digit size some other way.)

Ok, done.

We also need a PyLong_DigitArrayWriter_Discard. Finish is trivial now, but if it can do a memory copy in the future, users should call Discard in error paths.

I would prefer to avoid adding such function for now, we can add it later if needed. What do you think?

@vstinner
Copy link
Author

vstinner commented Sep 3, 2024

@skirpichev @zooba @encukou: I updated the API. What do you think? Are we good yet?

@zooba
Copy link

zooba commented Sep 3, 2024

Is there any impact going to come out of the discussions of setting a hard (if theoretical) limit on the length?

@vstinner
Copy link
Author

vstinner commented Sep 3, 2024

Is there any impact going to come out of the discussions of setting a hard (if theoretical) limit on the length?

@serhiy-storchaka may know about that :-) This API is limited by Py_ssize_t ndigits. If the number of digits fits into a signed size_t, we are good.

The PR python/cpython#123498 defines # define MAX_LONG_DIGITS ((UINT64_MAX-1) / (uint64_t)PyLong_SHIFT) which is unsigned. If we want to be compatible with gh-123498 maximum, the API should be modified to use unsigned digits number. But well, INT64_MAX/PyLong_SHIFT (signed) digits is already is a very big number :-)

@encukou
Copy link
Collaborator

encukou commented Sep 4, 2024

I would prefer to avoid adding such function for now, we can add it later if needed. What do you think?

So, to destroy the writer (e.g. in an error handling path), the user should call PyLongWriter_Finish and XDECREF the result?

That works, but I'd much prefer writers to consistently and have both Finish and Discard.

@skirpichev
Copy link

If I'm reading the earlier benchmarks correctly, then we'd have to beat 111ns and 155ns for these

Sorry for a late answer @zooba, but no. Those timings are for writing API (mpz->int). Above benchmarks are for reading (int->mpz). I'll repeat them in one place:

For export

On the master:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 233 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 295 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 491 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.33 us +- 0.01 us

With new API (see aleaxit/gmpy#495):

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 37 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 315 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<300' 'mpz(x)'
Mean +- std dev: 508 ns +- 3 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<3000' 'mpz(x)'
Mean +- std dev: 2.34 us +- 0.01 us

Without special case for 1-digit:

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 291 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 316 ns +- 1 ns

Using PyUnstable_Long_IsCompact (like in proposed above code):

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 215 ns +- 1 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'
Mean +- std dev: 317 ns +- 2 ns

"if you replace PyLong_IsCompact and PyLong_CompactValue with PyLong_AsNativeBytes and a digit_size buffer":

$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<7' 'mpz(x)'
Mean +- std dev: 263 ns +- 2 ns
$ python -m pyperf timeit -q -s 'from gmpy2 import mpz;x=1<<38' 'mpz(x)'  # huh
Mean +- std dev: 350 ns +- 2 ns

So, using a quick check (currently with PyUnstable_Long_IsCompact) to fail on special functions for small integers, else use PyLong_GetDigits() (only one new function for reading digits) - has better performance then current API.

Maybe you are right and we should introduce a bunch of simple PyUnstable_* functions instead of this more complicated API, which has performance regressions.

clearly only going to be possible by leaking access to Long internals

I doubt that using single layout for "big enough" integers leaks something important. All MP libraries share that detail, hardly CPython will break this in a future.

I updated the API. What do you think? Are we good yet?

I like changes.

@vstinner
Copy link
Author

vstinner commented Sep 4, 2024

The PR python/cpython#123498 defines # define MAX_LONG_DIGITS ((UINT64_MAX-1) / (uint64_t)PyLong_SHIFT) which is unsigned. If we want to be compatible with gh-123498 maximum, the API should be modified to use unsigned digits number.

I'm wrong. MAX_LONG_DIGITS fits into signed size_t (Py_ssize_t), since it's a number of digits, not a number of bits: / (uint64_t)PyLong_SHIFT makes MAX_LONG_DIGITS smaller than PY_SSIZE_T_MAX.

@vstinner
Copy link
Author

vstinner commented Sep 4, 2024

@encukou:

So, to destroy the writer (e.g. in an error handling path), the user should call PyLongWriter_Finish and XDECREF the result?

You can prepare your data in advance before creating a writer, so later, your code cannot fail.

That works, but I'd much prefer writers to consistently and have both Finish and Discard.

Ok, I added a PyLongWriter_Discard() function.

@vstinner
Copy link
Author

vstinner commented Sep 4, 2024

@skirpichev
Copy link

I would appreciate @casevh opinion, but so far this API has severe performance degradation for reading small integers (int->mpz conversion), up to 25%. Hardly it's acceptable. So, probably projects like gmpy2, sage or flint-python - will use some other code path for such integers.

I'm wrong. MAX_LONG_DIGITS fits into signed size_t (Py_ssize_t), since it's a number of digits, not a number of bits

Yet maybe we should use size_t here, like GMP? Signed integer type looks odd in this context.

@casevh
Copy link

casevh commented Sep 5, 2024

My apologies for not following this thread as much as I would have liked. I'll also defer to @skirpichev for the "is it satisfactory" question.

I started optimizing gmpy around version 1.10. The single biggest win was using mpz_import/mpz_export directly into the PyLong internals. The only change to the code was support for 30-bit digits. The same code worked from Python 2.5 through Python 3.11. But this did require version specific builds.

The second win was using the PyLong_AsLongAndOverflow (and its variants) functions for conversion of Python ints into temporary C ints for direct use in many of the GMP function calls. Optimizing the performance of (mpz * 3) + 1 was more important than optimizing mpz(3) and mpz(1) because most of my code did mixed arithmetic. Note: the performance win was primarily from avoiding the creation of an exception.

As long as access to an unstable interface remains, I accept that the stable interface may be slightly slower. If the difference is significant, we will need to release stable and unstable versions.

Thanks for all the effort.

@skirpichev
Copy link

skirpichev commented Sep 5, 2024

if we change the size of the digits but we already leaked that into other compiled code (i.e. hardcoded from a macro), then we've broken it.

Ah, PyLong_GetDigits() declaration has Py_digit*... Well, this can be void*. To properly interpret this output - you need something like PyLong_GetNativeLayout(). I think this saves freedom to change digit size just as the API, proposed by Victor.

Only thing I want to change here is the const PyLongLayout *layout argument to PyLongWriter_Create.

+1

I'm also not sure if we should keep layout field in the PyLong_DigitArray struct. What else it could be if not native layout?!

BTW, all required information about native layout is already available via PyLong_GetInfo(). PyLongLayout has additionally endian and digits_order fields. But I doubt those will be changed in future. So, PyLong_GetInfo() could essentially replace PyLong_GetNativeLayout().

I accept that the stable interface may be slightly slower. If the difference is significant, we will need to release stable and unstable versions.

@casevh, here are some benchmarks. Does this look acceptable for you?

@casevh
Copy link

casevh commented Sep 5, 2024

@casevh, #35 (comment) are some benchmarks. Does this look acceptable for you?

Have you tried PyLong_AsLongAndOverflow instead of PyUnstable_Long_IsCompact? It might not be as fast but it is stable (I think).

@skirpichev
Copy link

skirpichev commented Sep 5, 2024

Have you tried PyLong_AsLongAndOverflow instead of PyUnstable_Long_IsCompact?

Yes, I did, just not included this case in above benchmarks. It's good for 1-2 digits and for many digits, but for ~10 - performance has negative impact.

benchmarks code
# r.py; reading
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', mpz, 1 << 7)
runner.bench_func('1<<38', mpz, 1 << 38)
runner.bench_func('1<<300', mpz, 1 << 300)
runner.bench_func('1<<3000', mpz, 1 << 3000)
# w.py; writing
import pyperf
from gmpy2 import mpz
runner = pyperf.Runner()
runner.bench_func('1<<7', int, mpz(1 << 7))
runner.bench_func('1<<38', int, mpz(1 << 38))
runner.bench_func('1<<300', int, mpz(1 << 300))
runner.bench_func('1<<3000', int, mpz(1 << 3000))

Benchmark results for reading (now my pr uses PyLong_AsLongAndOverflow):

Benchmark master new-api
1<<7 263 ns 268 ns: 1.02x slower
1<<38 317 ns 276 ns: 1.15x faster
1<<300 514 ns 588 ns: 1.14x slower
1<<3000 2.35 us 2.42 us: 1.03x slower
Geometric mean (ref) 1.01x slower

Hmm, maybe it's a good tradeoff.

JFR, benchmark results for writing:

Benchmark master new-api
1<<38 213 ns 210 ns: 1.02x faster
1<<300 535 ns 575 ns: 1.07x slower
1<<3000 2.27 us 2.32 us: 1.02x slower
Geometric mean (ref) 1.02x slower

Benchmark hidden because not significant (1): 1<<7

@encukou
Copy link
Collaborator

encukou commented Sep 5, 2024

Only thing I want to change here is the const PyLongLayout *layout argument to PyLongWriter_Create. I'm pretty sure this is a YAGNI situation

Yes, later we can add PyLongWriter_CreateWithLayout with an extra argument.

@vstinner
Copy link
Author

vstinner commented Sep 5, 2024

Ok, I updated the API:

  • Remove layout parameter of PyLongWriter_Create().
  • Remove layout member of PyLong_DigitArray structure.
  • Use an unsigned type (size_t) for ndigits.

@vstinner
Copy link
Author

vstinner commented Sep 5, 2024

@skirpichev: IMO it's fine to rely on PyUnstable_Long_IsCompact() for best performance. Even if it has "unstable" in its name, I expect the API to not change next releases, even if the implementation might change in the future. Well, or you can use PyLong_AsLongAndOverflow(), as you want.

@encukou
Copy link
Collaborator

encukou commented Sep 6, 2024

Use an unsigned type (size_t) for ndigits.

AFAIK, Python uses Py_ssize_t for counting bytes in nearly all existing APIs. I think we should continue doing that, even if GMP uses size_t.

@vstinner
Copy link
Author

vstinner commented Sep 6, 2024

@encukou:

AFAIK, Python uses Py_ssize_t for counting bytes in nearly all existing APIs. I think we should continue doing that, even if GMP uses size_t.

@skirpichev:

Yet maybe we should use size_t here, like GMP? Signed integer type looks odd in this context.

@skirpichev: are you ok to use signed Py_ssize_t?

@skirpichev
Copy link

are you ok to use signed Py_ssize_t?

This still look odd for me, but I agree with Petr - Py_ssize_t seems to be more consistent with the CPython API. size_t used in *alloc() functions.

@vstinner
Copy link
Author

vstinner commented Sep 6, 2024

Ok, I updated the API again to move back to signed Py_ssize_t.

@vstinner
Copy link
Author

vstinner commented Sep 9, 2024

@skirpichev: Are you ok with the latest API?

@skirpichev
Copy link

@vstinner, yes. Some minor nitpicks were mentioned above, feel free to ignore:

  • PyLongLayout has fields, that looks useless (hardly someday CPython will use non-native endianness for "digits" or most significant digit will be first)
  • the rest is already available via PyLong_GetInfo() and we can skip creation of a new API function

(PyLong_GetInfo() is slow, but it's not a problem, as it might be called just once at module initialization.)

@vstinner
Copy link
Author

vstinner commented Sep 9, 2024

@vstinner:

@skirpichev: Are you ok with the latest API?

@skirpichev:

@vstinner, yes

Ah, good :-)

@skirpichev:

BTW, all required information about native layout is already available via PyLong_GetInfo(). PyLongLayout has additionally endian and digits_order fields. But I doubt those will be changed in future. So, PyLong_GetInfo() could essentially replace PyLong_GetNativeLayout().

The purpose of this API is to stop relying on Python internals, but instead offer an API which expose all information needed by these features (import/export). Maybe CPython internals will not change, but maybe PyPy uses a different layout and can decide to change its layout between two PyPy versions. MicroPython, GraalPython, RustPython and others can also use a different layout. IMO it's worth it exposing "digits_order" and "endian". Also, a structure is more convenient than a namedtuple in C.

Well, an alternative would be to complete sys.int_info to add digits_order and endian, but I don't know if it makes sense to expose these info in Python.

For reference, sys.int_info doc: https://docs.python.org/dev/library/sys.html#sys.int_info

@skirpichev
Copy link

maybe PyPy uses a different layout

Certainly, no. And I would be surprised if some bigint library (including Python implementations) using something different here. Can anyone point on examples?

Well, an alternative would be to complete sys.int_info to add digits_order and endian,

Or extend only PyLong_GetInfo() output.

@zooba
Copy link

zooba commented Sep 9, 2024

I would be surprised if some bigint library (including Python implementations) using something different here. Can anyone point on examples?

You'd be surprised if there's a library out there that doesn't use 15-bit (or 30-bit) digits? I'd honestly be surprised if there was a bigint library that used those sizes, at least by default. Do they really all do it that way?

@skirpichev
Copy link

You'd be surprised if there's a library out there that doesn't use 15-bit (or 30-bit) digits?

No, it's fine: GMP is an example (that's why digit_size & bits_per_digit - really useful information). But GMP has same behaviour wrt digits_order and endian parameters. Is there a bigint library, where least significant "digit" ("limb", whatever) - stored last?

@encukou encukou added the vote label Sep 10, 2024
@zooba
Copy link

zooba commented Sep 10, 2024

Ah, you just meant the digit order.

On one hand, they may not exist and we may not imagine a reason why they would, but on the other hand, that's why we didn't have this function in the first place. Better to include all the options in there, just in case, since we've already failed to predict the future once 😉

@vstinner
Copy link
Author

It became difficult for me to follow this discussion and navigate between messages since [the PR] has 119 comments and this issue has 47 comments, and there are a few more related issues. So I decided to write down PEP 757 to summarize the discussion: python/peps#3961. It might be easier to make a decision on a PEP.

@vstinner
Copy link
Author

I wrote PEP 757 – C API to import-export Python integers, it's now online! I announced the PEP on discuss.python.org as well.

There is an open questions (by @skirpichev):

  • Should we add digits_order and endian members to sys.int_info and remove PyLong_GetNativeLayout()? The PyLong_GetNativeLayout() function returns a C structure which is more convenient to use in C than sys.int_info which uses Python objects.

I like PyLong_GetNativeLayout() and I prefer to keep it. I don't think that it's redundant since it uses C types which are more convenient to use in C. And I'm not sure about adding digits_order and endian members to sys.int_info: in Python, there are not useful information.

@vstinner vstinner changed the title Add import-export API for Python int objects PEP 757: Add import-export API for Python int objects Sep 23, 2024
@skirpichev
Copy link

@vstinner, now description is outdated wrt the current proposal. Maybe you can sync?

@vstinner
Copy link
Author

@vstinner, now description is outdated wrt the current proposal. Maybe you can sync?

PEP 757 now contains all details. So I just replaced the first message with a reference to PEP 757.

@vstinner
Copy link
Author

vstinner commented Oct 7, 2024

This issue has a long history, the API changed multiple times, it became confusing to follow it. So I created a new clean issue for PEP 757: #45. I close this issue.

@vstinner vstinner closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants