Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The C-API for Python to C integer conversion is, to be frank, a mess. #102471

Open
markshannon opened this issue Mar 6, 2023 · 10 comments
Open
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@markshannon
Copy link
Member

markshannon commented Mar 6, 2023

The C-API has built up over 30 years, in a haphazard way. So, it is no surprise that it is a bit of a mess.
What makes it worse is that it is based around the C long type, which is varies in size between architectures and operating systems in odd ways.
C longs are 32 bit on (almost?) all 32 bit machines, 64 bit on most 64 bit machines, except Windows when C longs are 32 bits on 64 bit machines. In other words, it is not a useful fixed size, like int32_t, nor does match the machine word size, like intptr_t.

We need a more consistent API for converting from Python integers to C integers and back again.
We should support both 32 bit and word size C integers. 32 bit, because we often want to store 32 bit values to save space on 64 bit machines, or for portability. We also want to support word size integers for performance and ease of coding.

This means we want 4 functions (2 sizes, 2 directions) to convert between C and Python integers.

Currently we have:

Width Py -> C C -> Py
32 bit Missing* Missing
Machine word Missing* PyLong_FromSsize_t

The C API has a function to convert Python ints to intptr_t, but it is missing efficient overflow handling.
It also has a function with efficient overflow handling, PyLong_AsLongAndOverflow, but that returns a long.

Here's what we want:

Width Py -> C C -> Py
32 bit PyInt_AsInt32 PyInt_FromInt32
Machine word PyInt_AsSsize_t PyInt_FromSsize_t

I'm using PyInt prefix, now that Python 2 is history. It makes it clearer what is the new API.

Note that I'm not handling unsigned values. I think the extra bit of precision is not worth the complexity of a larger API.
And if we decide that they are, we can always add them later.

Linked PRs

@markshannon markshannon added the type-feature A feature request or enhancement label Mar 6, 2023
@markshannon
Copy link
Member Author

markshannon commented Jul 4, 2023

We also need a few functions for querying and extracting the value of a Python int.

We want to query its sign:

int PyInt_IsNegative();
int PyInt_IsPositive();
int PyInt_IsZero();
int PyInt_Sign();

We want to import and export the digits of an integer, and to know how many digits there are.
GNU's MP library has mpz_import and mpz_export, which have quite a complex API, but might be a good model to use.
In addition we should provide a constant describing the "native" number of bits per digit, so that C extensions can extract the data efficiently.

@markshannon
Copy link
Member Author

markshannon commented Jul 4, 2023

mpz_import and mpz_export take 6 parameters each, and four of those are small numbers describing the layout. Having many int parameters is hard to read and error-prone. We should combine the layout parameters into a single struct (of 32 bits or less).

E.g.

typedef struct _PyIntExportLayout {
     uint8_t bits_per_digit,
     int8_t word_endian,
     int8_t array_endian,
     uint8_t digit_size,
} PyIntExportLayout;

PyLongObject *PyInt_Import(PyIntExportLayout layout, size_t count, const void *data);
int PyInt_Import(PyLongObject *op, PyIntExportLayout layout, size_t count, void *data);
size_t PyInt_DigitCount(PyLongObject *op, uint8_t bits_per_digit);
const PyIntExportLayout PY_INT_NATIVE_LAYOUT; /* Use this when possible, for speed */

@casevh
Copy link

casevh commented Oct 28, 2023

Hi. I'm the primary maintainer of gmpy2. I'd like to provide some comments with my experiences using the C-API.

I use PyLong_AsLongAndOverflow when I want a long value or immediately proceed with the full conversion of PyLong to mpz as quickly as possible. Avoiding the exception is a significant performance improvement. PyLong_AsUnsignedLongAndOverflow is used occasionally when GMP is expects an unsigned long.

PyLong_As[Unsigned]LongLongAndOverflow were used with MPIR to get 64-bit values on Windows. (MPIR extended GMP to support 64-bit native integer sizes.) gmpy2 doesn't currently use them but it would be nice if they could be kept.

I like your PyIntExportLayout idea for specifying the . I have a question about the usage of PyInt_Import - which side owns the conversion?

Is PyInt_Import intended to access external data (i.e. the mpz data) and create a PyLong? Does PyIntExportLayout then specify the format of the mpz data?

Would there be a corresponding PyInt_Export that exports the value of a PyLong into an external buffer with the format of the external buffer controlled by PyIntExportLayout? If so, who owns (CPython versus gmpy2) the memory allocated to the external buffer? (Note: GMP, MPFR, and MPC can use a different memory manager than CPython....)

This is reversed from the current conversion direction. For mpz to PyLong, gmpy2 asks CPython to create a new PyLong with sufficient space to store the output of mpz_export. And for PyLong to mpz, gmpy2 creates a new mpz with sufficient space to store the output of mpz_import.

I'll add another comment to the thread about the compact format.

Thanks for all the effort in improving CPython.

casevh

@vstinner
Copy link
Member

vstinner commented Jul 4, 2024

32 bit PyInt_AsInt32 PyInt_FromInt32

I created #120390 for that.

vstinner added a commit to vstinner/cpython that referenced this issue Jul 4, 2024
Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT
structure.
@serhiy-storchaka
Copy link
Member

I had plans to add PyLong_Import() and PyLong_Export() with GMP/libtommath inspired signatures. This is too general interface which allows to support many different representations.

@skirpichev
Copy link
Member

This is too general interface which allows to support many different representations.

This is relatively complex task, which is better suited to dedicated libraries. I would be rather surprised if some arbitrary precision math library lacks mpz_import/export-like functions.

If on CPython side we will have a "view" of integers as an array of digits - the rest of work could do any math library.

@serhiy-storchaka
Copy link
Member

Then please used different names than PyLong_Import()/PyLong_Export().

vstinner added a commit to vstinner/cpython that referenced this issue Jul 8, 2024
Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT
structure.
vstinner added a commit to vstinner/cpython that referenced this issue Aug 6, 2024
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
@vstinner
Copy link
Member

We need a more consistent API for converting from Python integers to C integers and back again.
We should support both 32 bit and word size C integers. 32 bit, because we often want to store 32 bit values to save space on 64 bit machines, or for portability. We also want to support word size integers for performance and ease of coding.

I added APIs for that with 4c6dca8:

  • Signed:
    • PyLong_FromInt32(), PyLong_FromInt64()
    • PyLong_AsInt32(), PyLong_AsInt64()
  • Unsigned:
    • PyLong_FromUInt32(), PyLong_FromUInt64()
    • PyLong_AsUInt32(), PyLong_AsUInt64()

@vstinner
Copy link
Member

We want to query its sign:
int PyInt_Sign();

PyLong_GetSign() was added to Python 3.14: https://docs.python.org/dev/c-api/long.html#c.PyLong_GetSign

int PyInt_IsNegative();
int PyInt_IsPositive();
int PyInt_IsZero();

There is an open discussion for these functions: capi-workgroup/decisions#29

@skirpichev
Copy link
Member

PyLong_IsPositive(), PyLong_IsNegative() and PyLong_IsZero() were added to Python 3.14: https://docs.python.org/dev/c-api/long.html#c.PyLong_IsPositive

vstinner added a commit that referenced this issue Dec 13, 2024
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
skirpichev added a commit to skirpichev/cpython that referenced this issue Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants