Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double Double: Toil & Trouble: Support for platforms without extended precision #1170

Open
vallis opened this issue Jan 21, 2022 · 22 comments
Open
Assignees
Labels

Comments

@vallis
Copy link
Contributor

vallis commented Jan 21, 2022

Revisiting this issue now that more people are getting M1 laptops...

It seems that it should be feasible to enable software-based extended precision using a numpy plugin such as https://github.com/tuwien-cms/xprec — maybe as simple as replacing np.float96/np.float192/np.longdouble with the xprec dtype.

(Doing something similar in tempo2 would require much code rewriting, so it would be good to have at least PINT running natively. By contrast tempo maybe fine already with libquadmath?)

@vallis
Copy link
Contributor Author

vallis commented Mar 31, 2022

I explored this a bit more... On the PINT side using xprec would not be too traumatic. Maybe as simple as monkeypatching numpy in PINT's init.py:

        class doubledouble:
            dtype = xprec.ddouble
    
            def __new__(cls, val):
                if isinstance(val, str):
                    return fromstr(val)
                else:
                    return xprec.ddouble.type(val)

        np.longdouble = doubledouble

The wrapper class is needed because xprec.ddouble is a dtype that cannot be used as a constructor. np.longdouble is actually a class, but numpy understands it as a dtype.

However I ran into problems somewhere in scipy, which probably require patching xprec. Specifically, xprec does not define ldexp for ddouble.

@dlakaplan
Copy link
Contributor

Interesting. I ran into some issues in testing this myself (https://github.com/dlakaplan/PINT/tree/tryxprec) but suspected that a solution along the lines of the one you identify would work. Does that also handle conversions to/from strings? Will that also work with the finfo calls? And most important, do you have any idea of the performance hit?

@vallis
Copy link
Contributor Author

vallis commented Mar 31, 2022

I'm glad you're trying that too!

There are things that are missing in xprec. One is conversion from strings, which I implemented quickly on the basis of the QD code (see below). I haven't done conversion to strings, but it should have similar complexity.

finfo does not work, because numpy doesn't have the right hooks for external dtypes. So xprec provides its own finfo. However one could monkey-patch it similarly to np.longdouble.

We'd really have to experiment to see the performance hit. But the use of longdouble is limited in PINT, and the datetimes are already handled with the (int + float) astropy type. Furthermore, PINT is fast with Rosetta, which must do some kind of float80 emulation internally. So I'm confident.

        def fromstr(s):
            r = xprec.ddouble.type(0.0)
            
            nd, point, sign, exp = 0, None, None, 0
            for i, c in enumerate(s):
                if c >= '0' and c <= '9':
                    r *= 10
                    r += float(c)

                    nd += 1
                elif c == '.':
                    if point is not None:
                        raise ValueError(f"Malformed floating point number {s}")
                        
                    point = nd
                elif c == '+' or c == '-':
                    if sign is not None:
                        raise ValueError(f"Malformed floating point number {s}")
                    
                    sign = -1 if c == '-' else 1
                elif c == 'e' or c == 'E':
                    exp = int(s[i+1:])
                    break
                
            if point is not None:
                exp -= (nd - point)
            
            if exp != 0:
                r *= xprec.ddouble.type(10.0**exp)
            
            if sign is not None:
                r *= sign
            
            return r

@AaronDJohnson
Copy link

AaronDJohnson commented Apr 2, 2022

https://github.com/AaronDJohnson/xprec/tree/for_pint fixes the ldexp issue. Now I'm onto some typecasting issues.

The functions already existed in xprec but were not exposed to Python and linked to the ldexp ufunc.

@AaronDJohnson
Copy link

It looks like astropy units cannot easily be cast to ddouble type, but there are no issues the other way around. So if we just reverse the order of multiplication whenever we have data * u.units -> u.units * data, then it works fine. There are a few other checks that need to be modified which I'm working on now.

@scottransom
Copy link
Member

@vallis Why do you say that longdouble is limited in PINT? It is uses basically every time that you compute pulse phase.

Note that basically all spin-freq related quantities, orbital periods, and DMs (for some reason...) end up as longdouble. And we convert most all astropy times to be longdoubles as well in order to save computation time (that's the "tdbld" column in the TOA table that we use quite a bit internally). For this to be acceptable, I think we would really need to not have a performance regression in x86 procs...

@AaronDJohnson
Copy link

AaronDJohnson commented Apr 2, 2022

xprec also does not implement tan, atan, or modf.

@vallis
Copy link
Contributor Author

vallis commented Apr 4, 2022

@scottransom - limited... I suppose I meant code-wise, in terms of lines of code that mention it (and therefore potential code paths for us to follow). Also, there would be no regression in x86 since this patch would not be activated, and np.longdouble would continue to make x86 80-bit types.

@AaronDJohnson, it's great that you managed to plug in ldexp. I tried, but I must have been using the macros incorrectly. tan and atan are in the original QD library and could be dropped into xprec... but I don't see modf. It does have conversions to integer though.

@AaronDJohnson
Copy link

I tried, but I must have been using the macros incorrectly.

The binary macro should work, but I had to make a new function to register ldexp since the types didn't match the unary or binary function used to register other ufuncs.

This function can be found here: https://github.com/AaronDJohnson/xprec/blob/9acf0386d98ef24465fc13717d7908a46f650f10/csrc/_dd_ufunc.c#L877

@dlakaplan dlakaplan changed the title Support for platforms without extended precision Double Double: Toil & Trouble: Support for platforms without extended precision Apr 5, 2022
@dlakaplan
Copy link
Contributor

Quick performance test on my intel Mac. I ran one of:

xn = np.array(np.random.rand(1000,2000),dtype=np.longdouble)
for iter in range(100):
        yn = np.exp(xn)

or

xx = np.array(np.random.rand(1000,2000),dtype=xprec.ddouble)
for iter in range(100):
        yx = np.exp(xx)

Still digesting the detailed profiler results, but the longdouble version ran in 8.5s, and the xprec version (admittedly at higher precision) ran in 57s. Do others see similar issues?

@vallis
Copy link
Contributor Author

vallis commented Apr 6, 2022

On my M1 Max, the xprec version takes 35 s, while the Rosetta-emulated extended precision takes 160 s. (Regular double precision is 0.5 s., so the additional precision costs 70x.

If we want to keeping using Macs though, I don't see an alternative.

Does tempo use quadruple precision by way of libquadmath? (As opposed to Intel's 80-bit extended precision?)

@aarchiba
Copy link
Contributor

aarchiba commented May 9, 2022

May I suggest, strongly, not monkeypatching numpy? I agree that supporting extended precision on non-intel machines would be really valuable, but I think PINT should do its own management of the extended precision type. This would have the advantage that we could offer a setting/option that made it possible to test the xprec version of everything on Intel machines (specifically, as part of our CI). Ideally one would even be able to do a comparison between residuals computed with xprec and native 80-bit to ensure that accuracy was not lost anywhere.

It seems that linear algebra will need to be operated by calling into xprec explicitly (or by implementing our own linear algebra operators). But I believe the only major linear algebra operation we use is the SVD, which xprec happily provides.

On the plus side, one should expect software double-double arithmetic to be faster than software emulation of 80-bit floats, and I think it might have more precision, so making it switchable without too much suffering might be interesting when one encounters linear algebra headaches with highly covariant fits.

In terms of implementing the switching, it seems to me that PINT could provide its own wrapper functions to replace direct use of np.longdouble and svd and other functions that produce/operate on high-precision values. One could have (sigh) a global setting that determined whether these used the 80-bit or the xprec versions. One could use monkeypatching to ensure that any calls directly to numpy functions raised exceptions during the conversion process.

@AaronDJohnson
Copy link

AaronDJohnson commented May 23, 2022

A PR to fix several missing ufuncs has been submitted to xprec. This should allow us to use the package with PINT.

@AaronDJohnson
Copy link

AaronDJohnson commented Aug 30, 2022

I haven't gotten around to this in a while, but I would like to throw in a quick update. With the changes mentioned above, another hurdle which I am currently working on is a problem with xprec not playing nicely with the astropy.units. The following snippet shows the problem....

from astropy import units as u
from xprec import ddouble
a = u.meter * 42.0  # works!
a = u.meter * ddouble.type(42.0)  # works!
a = ddouble.type(42.0) * u.meter  # does not work

I think that this can be solved by creating a class with an __init__ method and operator overloading, but I haven't had time to play around with it yet.

@vallis
Copy link
Contributor Author

vallis commented Aug 31, 2022

Hello Aaron, good that you're looking at this again. Now that you're "here", we can tackle it together.

Regarding this issue, would you say the problem lies with astropy.units (and may warrant a bug report to astropy) or with xprec?

@AaronDJohnson
Copy link

AaronDJohnson commented Aug 31, 2022

Hi Michele, that sounds great!

It seems to be an issue specific to xprec. Actually the following works:

import numpy as np
from astropy import units as u
from xprec import ddouble
a = np.array(xprec.ddouble.type(42.0)) * u.meter  # works
a = u.meter * np.array(xprec.ddouble.type(42.0))  # also works

So it seems that just casting the ddouble types to ndarray is one way to patch the issue (or subclass ndarray maybe?).

@vallis
Copy link
Contributor Author

vallis commented Sep 1, 2022

The problem is that when xprec tries to multiply a ddouble with "something", it will fail if it cannot cast "something" to ddouble, unless "something" is a numpy array, and then numpy takes over and does it correctly.

The workarounds np.array(xprec.ddouble.type(42.0)) * u.meter and u.Quantity(xprec.ddouble.type(42.0), u.meter) are both fine.

I do believe this is fixed in xprec by modifying the binary operation wrapper to attempt using the operation provided by "something":

#define PYWRAP_BINARY(name, inner, tp_inner_op)                         \
    static PyObject* name(PyObject* _x, PyObject* _y)                   \
    {                                                                   \
        ddouble r, x, y;                                                \
        if (PyArray_Check(_y))                                          \
            return PyArray_Type.tp_as_number->tp_inner_op(_x, _y);      \
                                                                        \
        if (PyDDouble_Cast(_x, &x) && PyDDouble_Cast(_y, &y)) {         \
            r = inner(x, y);                                            \
            return PyDDouble_Wrap(r);                                   \
        } else if (_y->ob_type->tp_as_number && _y->ob_type->tp_as_number->tp_inner_op) { \
            PyErr_Clear();                                              \
            return _y->ob_type->tp_as_number->tp_inner_op(_x, _y);      \
        }                                                               \
        return NULL;                                                    \
    }

The part I have added is the "else if". The PyErr_Clear() is needed because PyDDouble_Cast sets the "Cannot cast" error...

@AaronDJohnson
Copy link

AaronDJohnson commented Sep 2, 2022

It seems like the changes you made fix the issue, @vallis. Should we be worried about having to use PyErr_Clear()? Anyways, I've made the changes you posted above.

I created a class called xdouble that chooses between the doubledouble class and np.longdouble depending on whether a flag is set (based on finfo(np.longdouble).eps output).

I've replaced many of the np.longdouble instances that appear throughout PINT with this new class, and now I'm working my way through tests to see what is broken.

My working branch of PINT: https://github.com/AaronDJohnson/PINT/tree/m1

@AaronDJohnson
Copy link

One thing that I've run into now...
xdouble(a) + xdouble(b) != xdouble(str(a + b).
For example,

>> xdouble(1.0) + xdouble(0.1)
ddouble(1.1+-8.326672684688674e-17)
>> xdouble(str(1.1))
ddouble(1.1+-2.775557561562891e-17)

@vallis
Copy link
Contributor Author

vallis commented Sep 2, 2022

Thanks Aaron. I think the PyErr_Clear() may be ok since on that code path the error would only come from PyDDouble_Cast.

Regarding the PINT interface, I wonder what's the cleanest way to do it. Anne suggested that there should be a way to use xprec on x86. Also there may be another numpy extension down the line that we may want to experiment with.

It may be cleanest to use the naked xprec.ddouble, and set (say) pint.longdouble to either xprec.ddouble or np.longdouble. Then:

  • The np.longdouble() calls could be replaced with pint.longdouble.type(), which would require some discipline by pint developers. Or perhaps call can be implemented in xprec.ddouble in the C layer.
  • We would need to implement the fromstr constructor in the C layer. That seems doable.
  • np.finfo remains a problem.

Or perhaps what pint defines is not a class, but a factory, which returns either implementation based on a global setting. That would help performance a bit.

@vallis
Copy link
Contributor Author

vallis commented Sep 2, 2022

As for 1.0 + 0.1, indeed xdouble(1.0) + xdouble(0.1) == xdouble(1.1). Some roundoff errors are to be expected coming from double precision...

But is my fromstr implementation blameless? I see xdouble("1.0") + xdouble("0.1") == xdouble("1.1") but xdouble("1.0") == ddouble(1+5.551115123125783e-17), which doesn't look right, given that xdouble(1) == ddouble(1+0).

@dlakaplan
Copy link
Contributor

I was trying to diagnose some issues with clock corrections a few months ago. I was seeing weird differences in the printed times (ddoubles) within ipython. I spent forever trying to track those down and convinced myself that they were due to how the numbers were either printed and/or read in (I can't quite remember), although I never actually got around to coming up with a nice example. So I would not be surprised in there were some string conversion issues as you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants