LLVM-accelerated Generic Linear Algebra Subprograms (GLAS)
GLAS is a C library written in Dlang. No C++/D runtime is required but libc, which is available everywhere.
The library provides
- BLAS (Basic Linear Algebra Subprograms) API.
- GLAS (Generic Linear Algebra Subprograms) API.
CBLAS API can be provided by linking with Netlib's CBLAS library.
GLAS can be used with DMD and LDC but
LDC (LLVM D Compiler) >= 1.1.0 beta 6
should be installed in common path anyway.
Note performance issue #18.
GLAS can be included automatically in a project using dub (the D package manager). DUB will build GLAS and CPUID manually with LDC.
{
...
"dependencies": {
"mir-glas": "~><current_mir-glas_version>",
"mir-cpuid": "~><current_mir-cpuid_version>"
},
"lflags": ["-L$MIR_GLAS_PACKAGE_DIR", "-L$MIR_CPUID_PACKAGE_DIR"]
}
$MIR_GLAS_PACKAGE_DIR
and $MIR_CPUID_PACKAGE_DIR
will be replaced automatically by DUB to appropriate directories.
mir-glas
can be used like a common C library. It should be linked with mir-cpuid
.
A compiler, for example GCC, may require mir-cpuid
to be passed after mir-glas
: -lmir-glas -lmir-cpuid
.
GLAS API is based on the new ndslice
from mir-algorithm.
Other languages can use simple structure definition.
Examples are available for C and for Dlang.
C/C++ headers are located in include/
.
D headers are located in source/
.
There are two files:
glas/fortran.h
/glas/fortran.d
- for Netilb's BLAS APIglas/ndslice.h
/glas/ndslice.d
- for GLAS API
LDC (LLVM D Compiler) >= 1.1.0 beta 6
is required to build a project.
You may want to build LDC from source or use LDC 1.1.0 beta 6.
Beta 2 generates a lot of warnings that can be ignored. Beta 3 is not supported.
LDC binaries contains two compilers: ldc2 and ldmd2. It is recommended to use ldmd2 with mir-glas.
Recent LDC packages come with the dub package manager. dub is used to build the project.
Mir CPUID is CPU Identification Routines.
Download mir-cpuid
dub fetch mir-cpuid --cache=local
Change the directory
cd mir-cpuid-<current-mir-cpuid-version>/mir-cpuid
Build mir-cpuid
dub build --build=release-nobounds --compiler=ldmd2 --build-mode=singleFile --parallel --force
You may need to add --arch=x86_64
, if you use windows.
Copy libmir-cpuid.a
to your project or add its directory to the library path.
Download mir-glas
dub fetch mir-glas --cache=local
Change the directory
cd mir-glas-<current-mir-glas-version>/mir-glas
Build mir-glas
dub build --config=static --build=target-native --compiler=ldmd2 --build-mode=singleFile --parallel --force
You may need to add --arch=x86_64
if you use windows.
Copy libmir-glas.a
to your project or add its directory to the library path.
We are open for contributing! The hardest part (GEMM) is already implemented.
- CI testing with Netlib's BLAS test suite.
- CI testing with Netlib's CBLAS test suite.
- CI testing with Netlib's LAPACK test suite.
- CI testing with Netlib's LAPACKE test suite.
- Multi-threading
- GPU back-end
- Shared library support - requires only DUB configuration fixes.
- Level 3 - matrix-matrix operations
- GEMM - matrix matrix multiply
- SYMM - symmetric matrix matrix multiply
- HEMM - hermitian matrix matrix multiply
- SYRK - symmetric rank-k update to a matrix
- HERK - hermitian rank-k update to a matrix
- SYR2K - symmetric rank-2k update to a matrix
- HER2K - hermitian rank-2k update to a matrix
- TRMM - triangular matrix matrix multiply
- TRSM - solving triangular matrix with multiple right hand sides
- Level 2 - matrix-vector operations
- GEMV - matrix vector multiply
- GBMV - banded matrix vector multiply
- HEMV - hermitian matrix vector multiply
- HBMV - hermitian banded matrix vector multiply
- HPMV - hermitian packed matrix vector multiply
- TRMV - triangular matrix vector multiply
- TBMV - triangular banded matrix vector multiply
- TPMV - triangular packed matrix vector multiply
- TRSV - solving triangular matrix problems
- TBSV - solving triangular banded matrix problems
- TPSV - solving triangular packed matrix problems
- GERU - performs the rank 1 operation
A := alpha*x*y' + A
- GERC - performs the rank 1 operation
A := alpha*x*conjg( y' ) + A
- HER - hermitian rank 1 operation
A := alpha*x*conjg(x') + A
- HPR - hermitian packed rank 1 operation
A := alpha*x*conjg( x' ) + A
- HER2 - hermitian rank 2 operation
- HPR2 - hermitian packed rank 2 operation
- Level 1 - vector-vector and scalar operations. Note: Mir already provides generic implementation.
- ROTG - setup Givens rotation
- ROTMG - setup modified Givens rotation
- ROT - apply Givens rotation
- ROTM - apply modified Givens rotation
- SWAP - swap x and y
- SCAL -
x = a*x
. Note: requires addition optimization for complex numbers. - COPY - copy x into y
- AXPY -
y = a*x + y
. Note: requires addition optimization for complex numbers. - DOT - dot product
- DOTU - dot product. Note: requires addition optimization for complex numbers.
- DOTC - dot product, conjugating the first vector. Note: requires addition optimization for complex numbers.
- DSDOT - dot product with extended precision accumulation and result
- SDSDOT - dot product with extended precision accumulation
- NRM2 - Euclidean norm
- ASUM - sum of absolute values
- IAMAX - index of max abs value
Five steps
- Implement
cpuid_init
function formir-cpuid
. This function should be implemented per platform or OS. Already implemented targets are- x86, any OS
- x86_64, any OS
- Verify that source/glas/internal/memory.d contains an implementation for the OS. Already implemented targets are
- Posix (Linux, macOS, and others)
- Windows
- Add new configuration for register blocking to source/glas/internal/config.d. Already implemented configuration available for
- x87
- SSE2
- AVX / AVX2
- AVX512 (requires LLVM bug fixes).
- Create a Pool Request.
- Coordinate with LDC team in case of compiler bugs.
- GLAS has a generic internal implementation, which can be easily ported to any other architecture with minimal efforts (5 minutes).
- GLAS API provides more functionality comparing with BLAS.
- It is written in Dlang using generic programming.
- GLAS is faster.
- GLAS API is more user-friendly and does not require additional data copying.
- GLAS does not require C++ runtime comparing with Eigen.
- GLAS does not require platform specific optimizations like Eigen intrinsics micro kernels and OpenBLAS assembler macro kernels.
- GLAS has a simple implementation, which can be easily ported and extended.
GLAS is a lower level library than Eigen. For example, GLAS can be an Eigen BLAS back-end in the future
Lazy Evaluation and Aliasing can be easily implemented in D.
Explicit composition of operations can be done using mir.ndslice.algorithm
and multidimensional map
from mir.ndslice.topology
, which is a generic way to perform any lazy operations you want.