Skip to content

Commit

Permalink
finally got all gpu operation working, and implemented gpu constructo…
Browse files Browse the repository at this point in the history
…r and destructor
  • Loading branch information
jolatechno committed Dec 30, 2020
1 parent 86596c6 commit 07c524a
Show file tree
Hide file tree
Showing 5 changed files with 15 additions and 14 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

## Requirements

This library is design for Linux only _for now_, it require `g++-10` for compilation.
This library is design for Linux only _for now_, it require `g++` (or `g++-10` for __AMD__ GPU offloading) for compilation.

For the best result, you should install __Openmp__ (`libomp5-xx`), and compile the library using it.

For GPU offloading you will also need the correct GPU drivers, and either `gcc-10-offload-nvptx` for __NVIDIA__ cards, or `gcc-10-offload-amdgcn` for __AMD__ GPUs.
For GPU offloading you will also need the correct GPU drivers, and either `gcc-offload-nvptx` (or `gcc-10-offload-nvptx` if you want to use `g++-10`) for __NVIDIA__ cards, or `gcc-10-offload-amdgcn` for __AMD__ GPUs.

To take advantage of MPI, you need to install `mpic++`.

Expand All @@ -28,7 +28,7 @@ The function that are defined when using mpi are `.send()` and `.receive()` for

To compile it with __Openmp__, you need to use the `"openmp"` directive before all other targets, which will modify the `LDLIBS` variable in the [Makefile](./Makefile).

If compiled with __Openmp__, loops of more than 500 iterations will automatically use a `pragma omp parralel`. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="CPU_LIMIT=xxx"` with `make`.
If compiled with __Openmp__, loops of more than 500 iterations will automatically use a `pragma omp parralel`. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="-DCPU_LIMIT=xxx"` with `make`.

### Offloading to GPUs

Expand All @@ -40,7 +40,7 @@ If you encounter some errors you might want to also pass the flag `"CCFLAGS=-fc

All arithmetic operations, self-operators, and comparisons excluding `==, !=` (for performance reasons) are now supported on GPUs.

Loops of more than 10000 iterations will automatically be offloaded to GPUs. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="GPU_LIMIT=yyy"` with `make`. If you want to modify both the GPU offloading threshold and the __Openmp__ threshold, you need to use `CCFLAGS="-DCPU_LIMIT=yyy -DGPU_LIMIT=xxx"` with `make`.
Loops of more than 10000 iterations will automatically be offloaded to GPUs. You can change this threshold by adding `-DCPU_LIMIT=xxx` to `CCFLAGS` by using `CCFLAGS="-DGPU_LIMIT=yyy"` with `make`. If you want to modify both the GPU offloading threshold and the __Openmp__ threshold, you need to use `CCFLAGS="-DCPU_LIMIT=yyy -DGPU_LIMIT=xxx"` with `make`.

Atomic operations for type `uint8_t` are not supported by __Openmp__ on GPU (see [issue #1](https://github.com/jolatechno/binary_algebra/issues/1)). I finally found a work around for every operation, either by converting types, or by grouping operations together to only apply atomic operations on `long unsigned int`.

Expand Down
2 changes: 0 additions & 2 deletions performance_testing/functions.hpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
#include "../src/binary_arithmetic.hpp"

#include <stdio.h> //for testing
void multiplication_mat_vect(Matrix mat, Vector vect) {
printf("!! height : %d\n", mat.height); //for testing
mat * vect;
}

Expand Down
14 changes: 10 additions & 4 deletions performance_testing/test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,16 @@ int main(int argc, char** argv){
#endif

const int n_iter = 100;
#if defined(_OPENMP)
const int sizes[] = {
10, 100, 500,
};
#ifdef _OPENMP
#ifdef TARGET
const int sizes[] = {
100, 500, 1000,
};
#else
const int sizes[] = {
10, 100, 500,
};
#endif
#else
const int sizes[] = {
10, 50, 100,
Expand Down
3 changes: 0 additions & 3 deletions src/arithmetic.inl
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,6 @@ Matrix Matrix::operator*(Matrix const& other) const {
return res;
}

#include <stdio.h> //for testing
Vector Matrix::operator*(Vector const& other) const {
assert(width == other.height); //check if dimensions are compatible

Expand All @@ -347,8 +346,6 @@ Vector Matrix::operator*(Vector const& other) const {
uint8_t *other_blocks = other.blocks;
uint64_t *this_blocks = blocks;

printf("!! adress : %p\n", this_blocks); //for testing

int16_t i, k;
#if defined(_OPENMP) && defined(TARGET)
if(_height*_width > GPU_LIMIT) {
Expand Down
2 changes: 1 addition & 1 deletion src/openmp.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#pragma once

#if defined(_OPENMP)
#ifdef _OPENMP
#define _OPENMP_PRAGMA(all) _Pragma(all)

#include <omp.h>
Expand Down

0 comments on commit 07c524a

Please sign in to comment.