-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double Double: Toil & Trouble: Support for platforms without extended precision #1170
Comments
I explored this a bit more... On the PINT side using xprec would not be too traumatic. Maybe as simple as monkeypatching numpy in PINT's init.py:
The wrapper class is needed because xprec.ddouble is a dtype that cannot be used as a constructor. np.longdouble is actually a class, but numpy understands it as a dtype. However I ran into problems somewhere in scipy, which probably require patching xprec. Specifically, xprec does not define ldexp for ddouble. |
Interesting. I ran into some issues in testing this myself (https://github.com/dlakaplan/PINT/tree/tryxprec) but suspected that a solution along the lines of the one you identify would work. Does that also handle conversions to/from strings? Will that also work with the |
I'm glad you're trying that too! There are things that are missing in xprec. One is conversion from strings, which I implemented quickly on the basis of the QD code (see below). I haven't done conversion to strings, but it should have similar complexity.
We'd really have to experiment to see the performance hit. But the use of longdouble is limited in PINT, and the datetimes are already handled with the (int + float) astropy type. Furthermore, PINT is fast with Rosetta, which must do some kind of float80 emulation internally. So I'm confident.
|
https://github.com/AaronDJohnson/xprec/tree/for_pint fixes the The functions already existed in |
It looks like |
@vallis Why do you say that longdouble is limited in PINT? It is uses basically every time that you compute pulse phase. Note that basically all spin-freq related quantities, orbital periods, and DMs (for some reason...) end up as longdouble. And we convert most all astropy times to be longdoubles as well in order to save computation time (that's the "tdbld" column in the TOA table that we use quite a bit internally). For this to be acceptable, I think we would really need to not have a performance regression in x86 procs... |
|
@scottransom - limited... I suppose I meant code-wise, in terms of lines of code that mention it (and therefore potential code paths for us to follow). Also, there would be no regression in x86 since this patch would not be activated, and @AaronDJohnson, it's great that you managed to plug in |
The binary macro should work, but I had to make a new function to register This function can be found here: https://github.com/AaronDJohnson/xprec/blob/9acf0386d98ef24465fc13717d7908a46f650f10/csrc/_dd_ufunc.c#L877 |
Quick performance test on my intel Mac. I ran one of:
or
Still digesting the detailed profiler results, but the |
On my M1 Max, the xprec version takes 35 s, while the Rosetta-emulated extended precision takes 160 s. (Regular double precision is 0.5 s., so the additional precision costs 70x. If we want to keeping using Macs though, I don't see an alternative. Does tempo use quadruple precision by way of libquadmath? (As opposed to Intel's 80-bit extended precision?) |
May I suggest, strongly, not monkeypatching numpy? I agree that supporting extended precision on non-intel machines would be really valuable, but I think PINT should do its own management of the extended precision type. This would have the advantage that we could offer a setting/option that made it possible to test the It seems that linear algebra will need to be operated by calling into xprec explicitly (or by implementing our own linear algebra operators). But I believe the only major linear algebra operation we use is the SVD, which xprec happily provides. On the plus side, one should expect software double-double arithmetic to be faster than software emulation of 80-bit floats, and I think it might have more precision, so making it switchable without too much suffering might be interesting when one encounters linear algebra headaches with highly covariant fits. In terms of implementing the switching, it seems to me that PINT could provide its own wrapper functions to replace direct use of |
A PR to fix several missing |
I haven't gotten around to this in a while, but I would like to throw in a quick update. With the changes mentioned above, another hurdle which I am currently working on is a problem with from astropy import units as u
from xprec import ddouble
a = u.meter * 42.0 # works!
a = u.meter * ddouble.type(42.0) # works!
a = ddouble.type(42.0) * u.meter # does not work I think that this can be solved by creating a class with an |
Hello Aaron, good that you're looking at this again. Now that you're "here", we can tackle it together. Regarding this issue, would you say the problem lies with astropy.units (and may warrant a bug report to astropy) or with xprec? |
Hi Michele, that sounds great! It seems to be an issue specific to import numpy as np
from astropy import units as u
from xprec import ddouble
a = np.array(xprec.ddouble.type(42.0)) * u.meter # works
a = u.meter * np.array(xprec.ddouble.type(42.0)) # also works So it seems that just casting the |
The problem is that when xprec tries to multiply a ddouble with "something", it will fail if it cannot cast "something" to ddouble, unless "something" is a numpy array, and then numpy takes over and does it correctly. The workarounds I do believe this is fixed in xprec by modifying the binary operation wrapper to attempt using the operation provided by "something": #define PYWRAP_BINARY(name, inner, tp_inner_op) \
static PyObject* name(PyObject* _x, PyObject* _y) \
{ \
ddouble r, x, y; \
if (PyArray_Check(_y)) \
return PyArray_Type.tp_as_number->tp_inner_op(_x, _y); \
\
if (PyDDouble_Cast(_x, &x) && PyDDouble_Cast(_y, &y)) { \
r = inner(x, y); \
return PyDDouble_Wrap(r); \
} else if (_y->ob_type->tp_as_number && _y->ob_type->tp_as_number->tp_inner_op) { \
PyErr_Clear(); \
return _y->ob_type->tp_as_number->tp_inner_op(_x, _y); \
} \
return NULL; \
} The part I have added is the "else if". The |
It seems like the changes you made fix the issue, @vallis. Should we be worried about having to use I created a class called I've replaced many of the My working branch of |
One thing that I've run into now... >> xdouble(1.0) + xdouble(0.1)
ddouble(1.1+-8.326672684688674e-17)
>> xdouble(str(1.1))
ddouble(1.1+-2.775557561562891e-17) |
Thanks Aaron. I think the PyErr_Clear() may be ok since on that code path the error would only come from PyDDouble_Cast. Regarding the PINT interface, I wonder what's the cleanest way to do it. Anne suggested that there should be a way to use xprec on x86. Also there may be another numpy extension down the line that we may want to experiment with. It may be cleanest to use the naked xprec.ddouble, and set (say) pint.longdouble to either xprec.ddouble or np.longdouble. Then:
Or perhaps what pint defines is not a class, but a factory, which returns either implementation based on a global setting. That would help performance a bit. |
As for 1.0 + 0.1, indeed xdouble(1.0) + xdouble(0.1) == xdouble(1.1). Some roundoff errors are to be expected coming from double precision... But is my fromstr implementation blameless? I see xdouble("1.0") + xdouble("0.1") == xdouble("1.1") but xdouble("1.0") == ddouble(1+5.551115123125783e-17), which doesn't look right, given that xdouble(1) == ddouble(1+0). |
I was trying to diagnose some issues with clock corrections a few months ago. I was seeing weird differences in the printed times (ddoubles) within ipython. I spent forever trying to track those down and convinced myself that they were due to how the numbers were either printed and/or read in (I can't quite remember), although I never actually got around to coming up with a nice example. So I would not be surprised in there were some string conversion issues as you suggest. |
Revisiting this issue now that more people are getting M1 laptops...
It seems that it should be feasible to enable software-based extended precision using a
numpy
plugin such as https://github.com/tuwien-cms/xprec — maybe as simple as replacingnp.float96
/np.float192
/np.longdouble
with the xprec dtype.(Doing something similar in tempo2 would require much code rewriting, so it would be good to have at least PINT running natively. By contrast tempo maybe fine already with libquadmath?)
The text was updated successfully, but these errors were encountered: