To build and install FlexFlow, follow the instructions below.
Clone the FlexFlow source code, and the third-party dependencies from GitHub.
git clone --recursive https://github.com/flexflow/FlexFlow.git
FlexFlow has system dependencies on cuda and/or rocm depending on which gpu backend you target. The gpu backend is configured by the cmake variable FF_GPU_BACKEND
. By default, FlexFlow targets CUDA. docker/base/Dockerfile
installs system dependencies in a standard ubuntu system.
If you are targeting CUDA, FlexFlow requires CUDA and CUDNN to be installed. You can follow the standard nvidia installation instructions CUDA and CUDNN.
Disclaimer: CUDA architectures < 60 (Maxwell and older) are no longer supported.
If you are targeting ROCM, FlexFlow requires a ROCM and HIP installation with a few additional packages. Note that this can be done on a system with or without an AMD GPU. You can follow the standard installation instructions ROCM and HIP. When running amdgpu-install
, install the use cases hip and rocm. You can avoid installing the kernel drivers (not necessary on systems without an AMD graphics card) with --no-dkms
I.e. amdgpu-install --usecase=hip,rocm --no-dkms
. Additionally, install the packages hip-dev
, hipblas
, miopen-hip
, and rocm-hip-sdk
.
See ./docker/base/Dockerfile
for an example ROCM install.
This is not currently supported.
If you are planning to build the Python interface, you will need to install several additional Python libraries, please check this for details. If you are only looking to use the C++ interface, you can skip to the next section.
We recommend that you create your own conda
environment and then install the Python dependencies, to avoid any version mismatching with your system pre-installed libraries.
The conda
environment can be created and activated as:
conda env create -f conda/flexflow.yml
conda activate flexflow
You can configure a FlexFlow build by running the config/config.linux
file in the build folder. If you do not want to build with the default options, you can set your configurations by passing (or exporting) the relevant environment variables. We recommend that you spend some time familiarizing with the available options by scanning the config/config.linux
file. In particular, the main parameters are:
CUDA_DIR
is used to specify the directory of CUDA. It is only required when CMake can not automatically detect the installation directory of CUDA.CUDNN_DIR
is used to specify the directory of CUDNN. It is only required when CUDNN is not installed in the CUDA directory.FF_CUDA_ARCH
is used to set the architecture of targeted GPUs, for example, the value can be 60 if the GPU architecture is Pascal. To build for more than one architecture, pass a list of comma separated values (e.g.FF_CUDA_ARCH=70,75
). To compile FlexFlow for all GPU architectures that are detected on the machine, passFF_CUDA_ARCH=autodetect
(this is the default value, so you can also leaveFF_CUDA_ARCH
unset. If you want to build for all GPU architectures compatible with FlexFlow, passFF_CUDA_ARCH=all
. If your machine does not have any GPU, you have to set FF_CUDA_ARCH to at least one valid architecture code (orall
), since the compiler won't be able to detect the architecture(s) automatically.FF_USE_PYTHON
controls whether to build the FlexFlow Python interface.FF_USE_NCCL
controls whether to build FlexFlow with NCCL support. By default, it is set to ON.FF_LEGION_NETWORKS
is used to enable distributed run of FlexFlow. If you want to run FlexFlow on multiple nodes, follow instructions in the Multinode tutorial and set the corresponding parameters as follows:
- To build FlexFlow with GASNet, set
FF_LEGION_NETWORKS=gasnet
andFF_GASNET_CONDUIT
as a specific conduit (e.g.ibv
,mpi
,udp
,ucx
) inconfig/config.linux
when configuring the FlexFlow build. SetFF_UCX_URL
when you want to customize the URL to download UCX. - To build FlexFlow with native UCX, set
FF_LEGION_NETWORKS=ucx
inconfig/config.linux
when configuring the FlexFlow build. SetFF_UCX_URL
when you want to customize the URL to download UCX.
FF_BUILD_EXAMPLES
controls whether to build all C++ example programs.FF_MAX_DIM
is used to set the maximum dimension of tensors, by default it is set to 4.FF_USE_{NCCL,LEGION,ALL}_PRECOMPILED_LIBRARY
, controls whether to build FlexFlow using a pre-compiled version of the Legion, NCCL (ifFF_USE_NCCL
isON
), or both libraries . By default,FF_USE_NCCL_PRECOMPILED_LIBRARY
andFF_USE_LEGION_PRECOMPILED_LIBRARY
are both set toON
, allowing you to build FlexFlow faster. If you want to build Legion and NCCL from source, set them toOFF
.
More options are available in cmake, please run ccmake
and search for options starting with FF.
You can build FlexFlow in three ways: with CMake, with Make, and with pip
. We recommend that you use the CMake building system as it will automatically build all C++ dependencies inlcuding NCCL and Legion.
To build FlexFlow with CMake, go to the FlexFlow home directory, and run
mkdir build
cd build
../config/config.linux
make -j N
where N is the desired number of threads to use for the build.
To build Flexflow with pip
, run pip install .
from the FlexFlow home directory. This command will build FlexFlow, and also install the Python interface as a Python module.
The Makefile we provide is mainly for development purposes, and may not be fully up to date. To use it, run:
cd python
make -j N
After building FlexFlow, you can test it to ensure that the build completed without issue, and that your system is ready to run FlexFlow.
Set the FF_HOME
environment variable before running FlexFlow. To make it permanent, you can add the following line in ~/.bashrc.
export FF_HOME=/path/to/FlexFlow
The Python examples are in the examples/python. The native, Keras integration and PyTorch integration examples are listed in native
, keras
and pytorch
respectively.
To run the Python examples, you have two options: you can use the flexflow_python
interpreter, available in the build
folder, or you can use the native Python interpreter. If you choose to use the native Python interpreter, you should either install FlexFlow, or, if you prefer to build without installing, export the required environment flags by running the following command (edit the path if your build folder is not named build
):
source ./build/set_python_envs.sh
We recommend that you run the mnist_mlp
test under native
using the following cmd to check if FlexFlow has been installed correctly:
cd "$FF_HOME"
./python/flexflow_python examples/python/native/mnist_mlp.py -ll:py 1 -ll:gpu 1 -ll:fsize <size of gpu buffer> -ll:zsize <size of zero buffer>
A script to run all the Python examples is available at tests/training_tests.sh
The C++ examples are in the examples/cpp. For example, the AlexNet can be run as:
./alexnet -ll:gpu 1 -ll:fsize <size of gpu buffer> -ll:zsize <size of zero buffer>
Size of buffers is in MBs, e.g. for an 8GB gpu -ll:fsize 8000
If you built/installed FlexFlow using pip
, this step is not required. If you built using Make or CMake, install FlexFlow with:
cd build
make install