Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] restructured cell linked list and other optimizations #623

Merged
merged 9 commits into from
Jul 13, 2024

Conversation

nR3D
Copy link
Collaborator

@nR3D nR3D commented Jul 13, 2024

Sorry for the long PR, I have noticed some commits I didn't upstream before.
The main changes are:

  • a new internal data structure for the cell linked list, which prioritizes contiguous memory access
  • moving from size_t (i.e. uint64_t) to uint32_t, since GPU architectures provide int32 units

Other changes are:

  • set the default sycl target to spir64 instead of a generic nvidia if no target is provided to CMake
  • default workgroup size has been increased to 128, but it remains an empirical value, ideally it should be close to the mean neighborhood size
  • added support for oneDPL when sorting. Currently a custom radix sort is provided which removes a kernel launch by initializing an array on-the-fly during the first sorting iteration, this gives a slightly better performance than oneDPL. Nevertheless oneDPL could trivially replace it if the support for a custom method is considered too demanding. I have used a new macro SPHINXSYS_USE_ONEDPL_SORTING, which should be added to the CMake if this addition is deemed valuable, otherwise I could revert the commit and/or move it to a different PR for discussion

@nR3D nR3D requested a review from Xiangyu-Hu July 13, 2024 18:59
@nR3D
Copy link
Collaborator Author

nR3D commented Jul 13, 2024

To better comment on the new data structure for the cell linked list:
previously it was implemented as a non-continuous linked list, where each particle ID would point to the next ID in the list. It led to scattered memory access to the same cells, since the particles IDs, even if recently sorted, would not be strictly in succession. So the only way to retrieve their value was to make a while loop that would get all particles in a cell until one was found. With the new structure, particles IDs are positions in a strict order delimited by a cell size, hence a limited for-loop can be used to retrieve a cell's particles.
To do so, a particle_id_list_ array is containing all the particles IDs distributed based on their cell ID, a second array offset_cell_size_ identifies the displacement of each cell inside particle_id_list_ so that the begin (and end) of each cell can be determined directly. Additionally an auxiliary array curr_cell_size_ is used by CellLinkedListKernel::UpdateCellLists to atomically keep track of the number of particles already inserted inside a cell, to avoid overwriting particles when executing concurrently.
The update procedure results to be more demanding than the previous implementation, but the easier access with a known cell size and contiguous memory access leads to an overall performance improvement.

cell-linked-list

@Xiangyu-Hu
Copy link
Owner

Xiangyu-Hu commented Jul 13, 2024

Great! Thanks. I am current working on target of gradually merge your work to the main branch. For this, I am doing preparation changes so that I can implement SYCL for the entire library. If you have interest, you can check the branch sycl_redo.

@Xiangyu-Hu
Copy link
Owner

To better comment on the new data structure for the cell linked list: previously it was implemented as a non-continuous linked list, where each particle ID would point to the next ID in the list. It led to scattered memory access to the same cells, since the particles IDs, even if recently sorted, would not be strictly in succession. So the only way to retrieve their value was to make a while loop that would get all particles in a cell until one was found. With the new structure, particles IDs are positions in a strict order delimited by a cell size, hence a limited for-loop can be used to retrieve a cell's particles. To do so, a particle_id_list_ array is containing all the particles IDs distributed based on their cell ID, a second array offset_cell_size_ identifies the displacement of each cell inside particle_id_list_ so that the begin (and end) of each cell can be determined directly. Additionally an auxiliary array curr_cell_size_ is used by CellLinkedListKernel::UpdateCellLists to atomically keep track of the number of particles already inserted inside a cell, to avoid overwriting particles when executing concurrently. The update procedure results to be more demanding than the previous implementation, but the easier access with a known cell size and contiguous memory access leads to an overall performance improvement.

cell-linked-list

Is this change is aim a similar approach as Nvidia's implementation for particle method?

@Xiangyu-Hu Xiangyu-Hu merged commit e55c877 into Xiangyu-Hu:sycl Jul 13, 2024
@nR3D
Copy link
Collaborator Author

nR3D commented Jul 13, 2024

Is this change is aim a similar approach as Nvidia's implementation for particle method?

Yes the approach is similar, but the main difference is that particles are directly positioned based on the cell size instead of sorting them based on their cell id

you can check the branch #618

Thank you I will keep an eye out for it, you can ping me if there are questions on what I've implemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants