Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idx variants #160

Merged
merged 19 commits into from
Aug 26, 2024
Merged

Idx variants #160

merged 19 commits into from
Aug 26, 2024

Conversation

richardjgowers
Copy link
Member

No description provided.

@richardjgowers richardjgowers changed the title wip: Idx variants Idx variants Aug 5, 2024
@@ -685,6 +685,169 @@ namespace distopia {
}
}

template <typename V>
HWY_INLINE void LoadInterleavedIdx(const unsigned int *idx, const float *src,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmacdope this is the interesting function that does the IDX gather here.

In general, the width of the integer part of the GatherIndexneeds to match the width of the floating point type, so for the current signature of int indices, the float case is simple and the double case requires an upcase (see below).

If we wanted to have int64 indices in the signature for Idx functions, we'd probably need two more specialisations to this function, which isn't too much of a drama. The int64 and float case is a little odd as arguably the ints could overflow, so maybe you couldn't vectorise this as hard... (using current highway bindings at least).

auto Vone = hn::Set(d_int, 1);
// convert indices to 3D coordinate indices, i.e. index 10 at pointer 30 etc
vidx = hn::Mul(vidx, Vthree);
x_dst = hn::GatherIndex(d, src, vidx);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually that good, I imagine for performance you instead would want to do loads of xyz vectors (with junk) then a transpose, but you'd have to write this for all vector widths, at which point maybe you'd be better off doing a contribution upstream (GatherIndexInterleaved3D...) rather than writing so much code in the bullet farm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is still clever, prepopulating the indices for a gather.

@richardjgowers richardjgowers requested a review from hmacdope August 5, 2024 18:03
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I will add Dihedrals and Benchmarks.

auto Vone = hn::Set(d_int, 1);
// convert indices to 3D coordinate indices, i.e. index 10 at pointer 30 etc
vidx = hn::Mul(vidx, Vthree);
x_dst = hn::GatherIndex(d, src, vidx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is still clever, prepopulating the indices for a gather.

@hmacdope hmacdope mentioned this pull request Aug 6, 2024
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I will fix the dihedral issues and expose at python layer.

@hmacdope hmacdope merged commit d435868 into main Aug 26, 2024
2 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants