Skip to content

Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU

License

Notifications You must be signed in to change notification settings

lattice-land/cuda-battery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Standard Library for CUDA Programming

Build Status

This library provides data structures to ease programming in CUDA (version 12 or higher). For a tutorial and further information, please read this manual.

Example

Quick example on how to transfer a std::vector on CPU to a battery::vector on GPU (notice you don't need to do any manual memory allocation or deallocation):

#include <vector>
#include "battery/vector.hpp"
#include "battery/unique_ptr.hpp"
#include "battery/allocator.hpp"

using mvector = battery::vector<int, battery::managed_allocator>;

__global__ void kernel(mvector* v_ptr) {
  mvector& v = *v_ptr;
  // ... Compute on `v` in parallel.
}

int main(int argc, char** argv) {
  std::vector<int> v(10000, 42);
  // Transfer from CPU vector to GPU vector.
  auto gpu_v = battery::make_unique<mvector, battery::managed_allocator>(v);
  kernel<<<256, 256>>>(gpu_v.get());
  CUDAEX(cudaDeviceSynchronize());
  // Transfering the new data to the initial vector.
  for(int i = 0; i < v.size(); ++i) {
    v[i] = (*gpu_v)[i];
  }
  return 0;
}

Common Questions

Quick Reference

  • Namespace: battery::*.
  • The documentation is not exhaustive (which is why we provide a link to the standard C++ STL documentation), but we document most of the main differences and the features without a standard counterpart.
  • The table below is a quick reference to the most useful features, but it is not exhaustive.
  • The structures provided here are not thread-safe, this responsibility is delegated to the user of this library.
Category Main features
Allocator standard_allocator global_allocator managed_allocator pool_allocator
Pointers shared_ptr (std) make_shared (std) allocate_shared (std)
unique_ptr (std) make_unique (std) make_unique_block make_unique_grid
Containers vector (std) string (std) dynamic_bitset
tuple variant (std) bitset (std)
Utility CUDA INLINE CUDAE CUDAEX
limits ru_cast rd_cast
popcount (std) countl_zero (std) countl_one (std) countr_zero (std)
countr_one (std) signum ipow
add_up add_down sub_up sub_down
mul_up mul_down div_up div_down
Memory local_memory read_only_memory atomic_memory
atomic_scoped_memory atomic_memory_block atomic_memory_grid

About

Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published