-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idx variants #160
Idx variants #160
Conversation
need to load fewer int values to cast to longs, to then load as doubles
this is the right combo of loading ints and promoting to longlongs
libdistopia/src/distopia.cpp
Outdated
@@ -685,6 +685,169 @@ namespace distopia { | |||
} | |||
} | |||
|
|||
template <typename V> | |||
HWY_INLINE void LoadInterleavedIdx(const unsigned int *idx, const float *src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmacdope this is the interesting function that does the IDX gather here.
In general, the width of the integer part of the GatherIndex
needs to match the width of the floating point type, so for the current signature of int
indices, the float case is simple and the double case requires an upcase (see below).
If we wanted to have int64 indices in the signature for Idx
functions, we'd probably need two more specialisations to this function, which isn't too much of a drama. The int64 and float case is a little odd as arguably the ints could overflow, so maybe you couldn't vectorise this as hard... (using current highway bindings at least).
auto Vone = hn::Set(d_int, 1); | ||
// convert indices to 3D coordinate indices, i.e. index 10 at pointer 30 etc | ||
vidx = hn::Mul(vidx, Vthree); | ||
x_dst = hn::GatherIndex(d, src, vidx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't actually that good, I imagine for performance you instead would want to do loads of xyz vectors (with junk) then a transpose, but you'd have to write this for all vector widths, at which point maybe you'd be better off doing a contribution upstream (GatherIndexInterleaved3D
...) rather than writing so much code in the bullet farm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is still clever, prepopulating the indices for a gather.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I will add Dihedrals and Benchmarks.
auto Vone = hn::Set(d_int, 1); | ||
// convert indices to 3D coordinate indices, i.e. index 10 at pointer 30 etc | ||
vidx = hn::Mul(vidx, Vthree); | ||
x_dst = hn::GatherIndex(d, src, vidx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah this is still clever, prepopulating the indices for a gather.
previously only populated 1/3 of the coordinate array with values...
loading ints takes int anyway, so don't confuse signed-ness of these
nidx * nidx shouldn't be too large...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will fix the dihedral issues and expose at python layer.
No description provided.