Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference [WIP] #475

Closed
wants to merge 389 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
389 commits
Select commit Hold shift + click to select a range
c9b2c5d
add decoder for gpt tokenizer
goliaro May 15, 2023
555aa33
Spec infer demo (#724)
jiazhihao May 16, 2023
16a5d02
Update README.md
jiazhihao May 16, 2023
b9fd233
Uses data and pipeline parallel by default. (#729)
jiazhihao May 16, 2023
b8e5586
Update README.md
jiazhihao May 16, 2023
07cb9f0
Merge branch 'master' into inference
goliaro May 18, 2023
0aabf34
fix make build, edit cmake
goliaro May 18, 2023
427d602
update std version in makefile
goliaro May 18, 2023
d87197d
file path adapt (#730)
xinhaoc May 18, 2023
b9fddec
Update README.md
goliaro May 19, 2023
dc6dcf8
Update README.md
goliaro May 19, 2023
1193b51
Update README.md
goliaro May 19, 2023
155989a
[Inference][CI] - Fix GPU-CI and `hip_rocm`build tests (#731)
goliaro May 20, 2023
f0604b3
[Inference] - Cleanup/refactor (#732)
goliaro May 21, 2023
28b31cd
Merging duplicate functions in IncMHA, SpecIncMHA, and TreeIncNHA (#736)
jiazhihao May 25, 2023
b0a5b9c
[Inference] - Alignment fixes (#740)
goliaro May 27, 2023
1ab3d80
Update README.md (#741)
goliaro May 27, 2023
9f5bf94
Supporting mixed-precision (Spec/Tree/Normal) Incremental MultiHead A…
jiazhihao May 30, 2023
d7dd6bb
[Inference] - Alignment tests (#742)
goliaro May 31, 2023
6c13936
Update README.md (#744)
xinhaoc May 31, 2023
d8072ab
fix
xinhaoc Jun 1, 2023
9f2688d
[Inference] - Add half precision & HuggingFace alignment tests + Spee…
goliaro Jun 4, 2023
2de6255
[SpecInfer] Running multiple SSMs with single RM (#734)
zwang86 Jun 8, 2023
ad75ac9
Merge branch 'inference' into fix_spec
zwang86 Jun 8, 2023
e131908
Fix inference test (#767)
goliaro Jun 15, 2023
eabad2d
Merge branch 'inference' into fix_spec
xinhaoc Jun 16, 2023
7e84575
Merge master into inference (#777)
goliaro Jun 17, 2023
3969a67
support falcon model (#762)
xinhaoc Jun 17, 2023
cd0d15f
Merge branch 'master' into inference
goliaro Jun 17, 2023
2fd3d69
[Inference] - Fix build issues (#779)
goliaro Jun 19, 2023
c44a64b
Support CPU Offload in SpecInfer (#765)
jiazhihao Jun 22, 2023
52c3656
Merge branch 'inference' into fix_spec
zwang86 Jun 23, 2023
0f3be1f
[Inference] Tensor model parallelism (#778)
goliaro Jun 25, 2023
3efc962
Merge branch 'inference' into fix_spec
zwang86 Jun 27, 2023
f74377a
Formatting.
zwang86 Jun 27, 2023
95e09eb
Docker-build and Publish Modification (#776)
DerrickYLJ Jun 27, 2023
71782e9
Merge branch 'inference' into fix_spec
zwang86 Jun 27, 2023
c40c3f1
add check for cargo (#812)
goliaro Jun 28, 2023
c4337f2
Merge branch 'inference' into fix_spec
zwang86 Jun 28, 2023
3a87e02
[Inference] - Fix Multiple-GPUs CI test (#804)
goliaro Jun 29, 2023
f02c9a0
Update README.md (#814)
DerrickYLJ Jun 29, 2023
d39d408
Merge branch 'inference' into fix_spec
goliaro Jun 29, 2023
08bda77
[Inference] - Better device placement in tensor model parallelism (#805)
goliaro Jun 29, 2023
7ff1d86
Merge branch 'inference' into fix_spec
jiazhihao Jun 29, 2023
e55b27e
Merge pull request #750 from xinhaoc/fix_spec
zwang86 Jun 30, 2023
e47a179
Revert "[Inference] fix bug when init_length + beam_depth > max_num_t…
goliaro Jun 30, 2023
d038e94
Merge `master` branch into `inference` (#835)
goliaro Jul 5, 2023
869d166
Fixation. (#840)
zwang86 Jul 8, 2023
93e3896
[Inference] - Save output of inference test as an artifact (#845)
goliaro Jul 8, 2023
53c5617
Using AllReduce instead of Reduce + Replicate when tensor model paral…
jiazhihao Jul 10, 2023
ae67898
change batch_size to num_active_tokens (#861)
xinhaoc Jul 16, 2023
58b745d
Add opt-13B config (#841)
lambda7xx Jul 16, 2023
88d2476
Merge branch 'master' into merge_master_into_inference
goliaro Jul 17, 2023
b359ce9
temp fix to bug
goliaro Jul 17, 2023
28fd257
linting
goliaro Jul 17, 2023
96e4138
replaced cudamemcpy with cudamemcpyasync
goliaro Jul 17, 2023
f6e4c5d
linting
goliaro Jul 17, 2023
319c69d
fix merge issue
goliaro Jul 17, 2023
3d494a1
fix bugs
goliaro Jul 17, 2023
b483b66
undo accidental change
goliaro Jul 17, 2023
1ee5c8f
Merge pull request #858 from flexflow/merge_master_into_inference
goliaro Jul 18, 2023
d3cd370
Inference: Sampling result (#854)
xinhaoc Jul 19, 2023
8a78103
Merge branch 'master' into inference
goliaro Jul 19, 2023
9317f25
Docker updates from `master`
goliaro Jul 19, 2023
3e23dd8
fix
goliaro Jul 19, 2023
866a9a5
Merge branch 'docker_fix' into merge_master_into_inference
goliaro Jul 19, 2023
f76a88d
Merge pull request #875 from flexflow/merge_master_into_inference
goliaro Jul 19, 2023
02d4b20
update new models weights (#837)
DerrickYLJ Jul 21, 2023
d62f193
Merge branch 'master' into inference
goliaro Jul 21, 2023
8caa803
Model weight flag explanation (#880)
DerrickYLJ Jul 21, 2023
2ba481b
Inference: fix batch_size issue. (#863)
xinhaoc Jul 21, 2023
d047aa6
Python interface for inference (part 1) (#878)
goliaro Jul 22, 2023
aef158a
Fix fusion bug (#889)
goliaro Jul 27, 2023
6b7e6f0
Inference: add argmax operator (#888)
xinhaoc Jul 27, 2023
821b32f
[Docker] - Make it easier to attach inference weights to docker (#891)
goliaro Jul 27, 2023
bf0f30e
Make BatchConfig and InferenceResult Legion futures (#860)
jiazhihao Jul 28, 2023
67977f4
Merge branch 'master' into inference
goliaro Jul 28, 2023
664667e
change argmax to DeviceSegmentedReduce::ArgMax && replace cudamalloc …
xinhaoc Jul 28, 2023
0f8b486
enable tracing (#901)
jiazhihao Jul 30, 2023
f07de46
Fixed edge case. (#903)
zwang86 Jul 31, 2023
7a14a01
Merge branch 'master' into inference
goliaro Jul 31, 2023
ba91733
Python interface for inference (part 2) (#893)
goliaro Aug 2, 2023
d1ef0ed
Support Group Attention (Llama 2) (#883)
xinhaoc Aug 3, 2023
c19882e
api update
goliaro Aug 4, 2023
fafbbc2
Cleanup (#914)
goliaro Aug 4, 2023
654095e
fix (#916)
xinhaoc Aug 7, 2023
3e9e37c
Merge branch 'master' into inference
goliaro Aug 8, 2023
0bc2b01
[Inference] - Cleanup, C++/Python API update (#915)
goliaro Aug 8, 2023
30542b7
Merge branch 'master' into inference
goliaro Aug 13, 2023
bcf14a7
merge fix
goliaro Aug 13, 2023
a78947c
update tokenizers-cpp repo
goliaro Aug 15, 2023
77e4841
starcoder model. (#962)
xinhaoc Aug 16, 2023
1f04328
New README.md for FlexFlow Serve (#960)
jiazhihao Aug 16, 2023
4fd369a
Fix CUDA Error in the sampling operator (#966)
jiazhihao Aug 16, 2023
1179a8e
Fix `requirements.txt` (#969)
goliaro Aug 17, 2023
534adaf
check starcoder not run with tp (#971)
xinhaoc Aug 17, 2023
d5a1dcc
Docs update (#970)
goliaro Aug 17, 2023
88f70e3
Fix conda in CI (#974)
goliaro Aug 17, 2023
97c62b1
change ff.init interface to accept parameters (#973)
xinhaoc Aug 17, 2023
d2a0629
Update README.md (#975)
xinhaoc Aug 17, 2023
66570c5
Update README.md
jiazhihao Aug 18, 2023
18946ba
adding f for fstring (#990)
brianyu-nexusflowai Aug 19, 2023
68a5a54
link to stdc++fs (#985)
jiazhihao Aug 19, 2023
0ec4189
add GenerationResult to the Python interface (#1000)
jiazhihao Aug 21, 2023
2f6f864
update pr template
goliaro Aug 21, 2023
a5ffc62
support loading local model (#1004)
goliaro Aug 21, 2023
cf13ee7
Add multinode tutorial to readthedocs (#1019)
goliaro Aug 23, 2023
9d0bc56
Allow FlexFlow Serve to stop when EOS token is generated (#1026)
jiazhihao Aug 25, 2023
686b1e6
Merge branch 'master' into inference
goliaro Aug 27, 2023
dfbe554
Build docker images in more cuda versions (#1030)
goliaro Aug 27, 2023
00be68d
Automatic bos/eos token determination, plus docker fix (#1031)
goliaro Aug 28, 2023
5bf476c
clean: duplicate in requirements.txt (#1034)
raphaelauv Aug 28, 2023
e6763fa
Build Docker images for AMD gpus (#1041)
goliaro Aug 31, 2023
85acb41
Remove zlib (#1086)
goliaro Sep 2, 2023
3af422d
support AMD in inference branch (#996)
xinhaoc Sep 2, 2023
7aa1862
Fix compile error in debug mode (#1088)
vincent-163 Sep 2, 2023
7adf106
Update docs (#1091)
goliaro Sep 2, 2023
b2ec6cb
fix group attention issue (#1062)
xinhaoc Sep 3, 2023
1f5fe02
Add method to initialize FlexFlow runtime (#1089)
goliaro Sep 6, 2023
1d5b4c6
support MPT model (#1093)
xinhaoc Sep 6, 2023
4adad7d
Update docker-build.yml
goliaro Sep 6, 2023
8f04bea
bug fix
goliaro Sep 11, 2023
c7cc6b4
Fix Falcon model, inference test in CI (#1138)
goliaro Sep 17, 2023
b1b4461
fix ci
goliaro Sep 19, 2023
2ef52f8
Do not run empty kernels (`num_tokens=0`) (#1141)
goliaro Sep 20, 2023
a4f2588
Fuse inference kernels to reduce kernel launch overhead (part 1) (#1128)
goliaro Sep 21, 2023
322afa9
Fuse inference kernels (part 2) (#1143)
goliaro Sep 23, 2023
f2f9711
Build ROCm Docker images on Oracle instance (#1144)
DerrickYLJ Sep 24, 2023
48cca2b
fix (#1147)
goliaro Sep 24, 2023
02326e0
Docker workflow cleanup (#1148)
DerrickYLJ Sep 24, 2023
191df5d
fix oracle instance script
goliaro Sep 24, 2023
dfbd0fb
fix
goliaro Sep 24, 2023
5958971
fix
goliaro Sep 24, 2023
0a56d01
[SpecInfer] Update RequestManager (#1096)
zwang86 Sep 25, 2023
1d5e0c5
Fuse inference kernels (part 3) (#1146)
goliaro Sep 26, 2023
ee6090e
[SpecInfer] Reduce single request per batch overhead (#1155)
zwang86 Sep 29, 2023
426aa7d
Support new Falcon model (#1158)
goliaro Sep 29, 2023
0e68bb7
Fix `pip install` issues affecting some platforms (#1159)
goliaro Sep 30, 2023
65cb570
[Python] - Automatically install Rust with `pip install` if not avail…
goliaro Oct 1, 2023
5919fff
Fix model configs (Falcon in C++, LLAMA in Python) (#1162)
goliaro Oct 1, 2023
d9a95ef
Make MAX_BATCH_SIZE, MAX_NUM_TOKENS, MAX_SEQ_LENGTH user-provided inp…
jiazhihao Oct 1, 2023
edc6c49
[Cleanup] - Remove obsolete stuff (#1160)
goliaro Oct 2, 2023
de6933b
Compare flle path reliably (#1173)
suranap Oct 5, 2023
50ff264
[Tool] - Add mechanism to save operators' tensors to file (#1174)
goliaro Oct 8, 2023
5e34846
fix backward gelu, layernorm (#1187)
xinhaoc Oct 10, 2023
7b57463
Optimize attention kernel v2 1.0, use Gemm replace GemmStridedBatch (…
xinhaoc Oct 14, 2023
f243b40
Allow token arrangement align with request index in batch (#1176)
zwang86 Oct 16, 2023
4c06a09
variable renaming (#1194)
jiazhihao Oct 17, 2023
fb0b21c
Add `first_token_offset_in_batch` to indicate the offset of the reque…
jiazhihao Oct 18, 2023
caf5d61
Update the data layout of m->attn_heads (#1204)
jiazhihao Oct 22, 2023
dd9f62d
Pre-build Legion library (#1042)
DerrickYLJ Oct 23, 2023
3009890
Fix CUDA cmake (#1205)
goliaro Oct 23, 2023
452fa9c
Fix Legion prebuild workflow (#1207)
goliaro Oct 23, 2023
d1da022
Fix Legion prebuild workflow (2) (#1208)
goliaro Oct 24, 2023
1105f4e
Fix Legion prebuild workflow (3) (#1210)
goliaro Oct 24, 2023
bd305f7
[CI/Docs/Examples] - Replace llama with llama2 model (#1219)
goliaro Nov 5, 2023
b0fe522
Fix inference tests in CI (#1225)
goliaro Nov 6, 2023
c6ad6e2
Update the default cublas behavior when CUDA_VERSION is not specified…
jiazhihao Nov 9, 2023
3bcf3d4
Reorder tokens in batch using based on token type (#1214)
zwang86 Nov 10, 2023
b15d060
Optimize attention kernel (#1228)
xinhaoc Nov 15, 2023
672cdad
fix ucx against inference branch (#1230)
eddy16112 Nov 17, 2023
457b5f2
post ucx fixes
goliaro Nov 28, 2023
5501cf8
Fix tensor shapes for elementwise binary operations with broadcasting…
soumyac1999 Dec 1, 2023
477afcb
Fix attention (#1238)
xinhaoc Dec 3, 2023
08f60b1
Fix HIP build for AMD (#1243)
goliaro Dec 12, 2023
3cf49a6
[Documentation] - Annotate attention kernel with shapes of tensors (#…
goliaro Dec 12, 2023
7e7f955
Fix link issue (#1247)
xinhaoc Dec 24, 2023
ed5a2e0
init
xinhaoc Dec 25, 2023
d3a57cb
fix speculative
xinhaoc Dec 26, 2023
617a29f
fix speculative
xinhaoc Dec 26, 2023
b5f9d5d
bitmap+tree verify
xinhaoc Dec 28, 2023
945268f
fix.
xinhaoc Dec 28, 2023
ce95127
fix
xinhaoc Dec 29, 2023
3ed25d6
multi batch
xinhaoc Dec 29, 2023
5c3ad35
copy metadata once
xinhaoc Dec 29, 2023
fae148d
fix some corner cases
xinhaoc Dec 30, 2023
6c44259
Replicate load_token tasks so that it can be fused with other compute…
jiazhihao Dec 30, 2023
ac11203
more fix.
xinhaoc Dec 30, 2023
7eaffbc
clean up
xinhaoc Dec 30, 2023
b621f2a
.
xinhaoc Dec 30, 2023
7b662f4
Merge remote-tracking branch 'origin/replicate_load_tokens' into xinh…
xinhaoc Dec 30, 2023
8a0b007
load batchconfig
xinhaoc Dec 30, 2023
17a718f
clean
xinhaoc Dec 31, 2023
c8a107b
hip
xinhaoc Dec 31, 2023
42e1b5d
hip
xinhaoc Dec 31, 2023
4957b7c
Specinfer - new kernel (#1252)
xinhaoc Dec 31, 2023
3047c82
Reducing memory requirements by reusing logical regions (#1254)
jiazhihao Jan 1, 2024
1901f65
embedding return when no token
xinhaoc Jan 1, 2024
130ad92
use arg topk instead of beam topk
xinhaoc Jan 1, 2024
8cdd215
Merge branch 'inference' into xinhao_specinfer
xinhaoc Jan 1, 2024
4259d2d
embedding
xinhaoc Jan 1, 2024
3478077
Merge branch 'xinhao_specinfer' of https://github.com/flexflow/FlexFl…
xinhaoc Jan 1, 2024
fae7fba
fmt
xinhaoc Jan 1, 2024
8d1d584
hip
xinhaoc Jan 1, 2024
25097e0
SpecInfer: optimize performance (#1255)
xinhaoc Jan 1, 2024
d7e8d72
fix corner case
xinhaoc Jan 2, 2024
b6f1c41
Merge branch 'inference' into xinhao_specinfer
xinhaoc Jan 2, 2024
a45826e
SpecInfer fix corner case (#1258)
xinhaoc Jan 2, 2024
8490e50
fix
xinhaoc Jan 2, 2024
c12f0c6
fix request id issue
xinhaoc Jan 3, 2024
0226941
Merge branch 'inference' into xinhao_specinfer
xinhaoc Jan 3, 2024
284ad77
Fix Request Id order issue (#1260)
xinhaoc Jan 3, 2024
e17fb8d
change MAX_SPECULATIVE_TREE_BRANCHES
xinhaoc Jan 4, 2024
4875f21
Merge branch 'xinhao_specinfer' of https://github.com/flexflow/FlexFl…
xinhaoc Jan 4, 2024
429ddb5
xinhaoc Jan 4, 2024
0ce3158
Merge branch 'inference' into xinhao_specinfer
xinhaoc Jan 4, 2024
7b00e81
Merge pull request #1261 from flexflow/xinhao_specinfer
xinhaoc Jan 4, 2024
4f61b9f
fix
goliaro Jan 8, 2024
29735f2
fixes to run chatgpt.json prompt dataset in python
goliaro Jan 8, 2024
ba4af39
fix
goliaro Jan 9, 2024
9c85a4f
Fuse bias and relu in OPT (#1265)
goliaro Jan 10, 2024
197e308
fix spec decoding
jiazhihao Jan 12, 2024
ed4dbd8
Revert "fix spec decoding"
jiazhihao Jan 12, 2024
12fdbac
Add a background server for RequestManager (#1223)
jiazhihao Jan 14, 2024
18cd485
Update README.md
jiazhihao Jan 14, 2024
75edadc
Better debugging/logging tools for alignment checks (#1275)
goliaro Jan 20, 2024
57d1883
Fix incorrect innode being checked (#1273)
FelixBrakel Jan 20, 2024
317cffd
Bug fixes and update Legion version (#1259)
jiazhihao Jan 26, 2024
d73bba1
Revert "Bug fixes and update Legion version" (#1286)
goliaro Jan 26, 2024
abf9fb8
Chatbot with Gradio, FastApi Endpoint, Langchain Integration (#1246)
april-yyt Jan 26, 2024
d21ed66
Bug fixes and update Legion version (#1287)
goliaro Jan 27, 2024
be28d71
Docs Modification for Python Usecases (#1291)
april-yyt Feb 5, 2024
e24eb03
Add support for docker machines with cuda 12.1 and cuda 12.2 (#1308)
goliaro Feb 22, 2024
0d75c10
Fix NCCL tear down issue, update docker pre-build cuda version list (…
goliaro Mar 3, 2024
ea31426
add expansion config param in specinfer
goliaro Mar 9, 2024
e03dec0
parametrize max_spec_tree_token_num
goliaro Mar 11, 2024
c856680
fix
goliaro Mar 13, 2024
8d82c91
fix
goliaro Mar 14, 2024
0479a64
fix
goliaro Mar 14, 2024
5bd7123
run CI per commit only on inference branch
goliaro Mar 30, 2024
e0a6e4f
fix
goliaro Mar 30, 2024
1210256
fix: 'model_configs' AttributeError (#1358)
chenzhuofu Apr 6, 2024
b4a639c
Changes to support Perlmutter environment (#1360)
goliaro Apr 8, 2024
7da197e
update workflow to build rocm docker images
goliaro Apr 24, 2024
002fdf0
downgrade to python 3.11 for now
goliaro Apr 24, 2024
d54e4b6
doc: fix c++ serving example (#1372)
chenzhuofu Apr 30, 2024
b90771a
Update README.md
jiazhihao May 30, 2024
385c118
Add examples for every layer in the python layer API (#1297)
FelixBrakel May 30, 2024
a83effe
add code to keep runners registered
goliaro Jun 20, 2024
4f82aae
fix docker
goliaro Jul 11, 2024
25fb407
[Tokenizer] update tokenizers-cpp repo
jiazhihao Jul 11, 2024
9e68c8c
Merge branch 'inference' of https://github.com/flexflow/FlexFlow into…
jiazhihao Jul 11, 2024
6a1a188
minor bug fix (#1456)
jiazhihao Aug 3, 2024
9784b5c
update legion version (#1307)
jiazhihao Aug 12, 2024
f747438
Managed mem support (#1466)
chenzhuofu Aug 13, 2024
6d710ac
pip flexflow_python typo (#1461)
stelleg Aug 20, 2024
3b59f05
update legion version
goliaro Aug 28, 2024
28aff70
Fix nccl-induced segfault (#1481)
goliaro Aug 31, 2024
49523d6
Fix python install issue caused by new Legion version (#1482)
goliaro Sep 2, 2024
a0f1ed7
PEFT support (inference/finetuning) (#1153)
jiazhihao Sep 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 8 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,11 @@ python/flexflow/core/legion_cffi_header.py
*.pb.h
*.o
*.a

# Ignore inference assets
/inference/weights/*
/inference/tokenizer/*
/inference/prompt/*
/inference/output/*

/tests/inference/python_test_configs/*.json
3 changes: 0 additions & 3 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,3 @@ Linked Issues:
Issues closed by this PR:
- Closes #

**Before merging:**

- [ ] Did you update the [flexflow-third-party](https://github.com/flexflow/flexflow-third-party) repo, if modifying any of the Cmake files, the build configs, or the submodules?
255 changes: 255 additions & 0 deletions .github/README.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions .github/workflows/build-skip.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ on:
pull_request:
paths-ignore:
- "include/**"
- "inference/**"
- "cmake/**"
- "config/**"
- "deps/**"
Expand Down
71 changes: 47 additions & 24 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ on:
pull_request:
paths:
- "include/**"
- "inference/**"
- "cmake/**"
- "config/**"
- "deps/**"
Expand All @@ -15,6 +16,7 @@ on:
- "master"
paths:
- "include/**"
- "inference/**"
- "cmake/**"
- "config/**"
- "deps/**"
Expand All @@ -38,6 +40,8 @@ jobs:
matrix:
gpu_backend: ["cuda", "hip_rocm"]
fail-fast: false
env:
FF_GPU_BACKEND: ${{ matrix.gpu_backend }}
steps:
- name: Checkout Git Repository
uses: actions/checkout@v3
Expand All @@ -48,39 +52,49 @@ jobs:
run: .github/workflows/helpers/free_space_on_runner.sh

- name: Install CUDA
uses: Jimver/cuda-toolkit@v0.2.11
uses: Jimver/cuda-toolkit@v0.2.16
if: ${{ matrix.gpu_backend == 'cuda' }}
id: cuda-toolkit
with:
cuda: "11.8.0"
cuda: "12.1.1"
# Disable caching of the CUDA binaries, since it does not give us any significant performance improvement
use-github-cache: "false"
log-file-suffix: 'cmake_${{matrix.gpu_backend}}.txt'

- name: Install system dependencies
run: FF_GPU_BACKEND=${{ matrix.gpu_backend }} .github/workflows/helpers/install_dependencies.sh
run: .github/workflows/helpers/install_dependencies.sh

- name: Install conda and FlexFlow dependencies
uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: flexflow
environment-file: conda/environment.yml
environment-file: conda/flexflow.yml
auto-activate-base: false

- name: Build FlexFlow
run: |
export CUDNN_DIR="$CUDA_PATH"
export CUDA_DIR="$CUDA_PATH"
export FF_HOME=$(pwd)
export FF_GPU_BACKEND=${{ matrix.gpu_backend }}
export FF_CUDA_ARCH=70
export FF_HIP_ARCH=gfx1100,gfx1036
export hip_version=5.6
export FF_BUILD_ALL_INFERENCE_EXAMPLES=ON

if [[ "${FF_GPU_BACKEND}" == "cuda" ]]; then
export FF_BUILD_ALL_EXAMPLES=ON
export FF_BUILD_UNIT_TESTS=ON
else
export FF_BUILD_ALL_EXAMPLES=OFF
export FF_BUILD_UNIT_TESTS=OFF
fi

cores_available=$(nproc --all)
n_build_cores=$(( cores_available -1 ))
if (( $n_build_cores < 1 )) ; then n_build_cores=1 ; fi
mkdir build
cd build
if [[ "${FF_GPU_BACKEND}" == "cuda" ]]; then
export FF_BUILD_ALL_EXAMPLES=ON
export FF_BUILD_UNIT_TESTS=ON
fi

../config/config.linux
make -j $n_build_cores

Expand All @@ -89,35 +103,44 @@ jobs:
export CUDNN_DIR="$CUDA_PATH"
export CUDA_DIR="$CUDA_PATH"
export FF_HOME=$(pwd)
export FF_GPU_BACKEND=${{ matrix.gpu_backend }}
export FF_CUDA_ARCH=70
cd build
export FF_HIP_ARCH=gfx1100,gfx1036
export hip_version=5.6
export FF_BUILD_ALL_INFERENCE_EXAMPLES=ON

if [[ "${FF_GPU_BACKEND}" == "cuda" ]]; then
export FF_BUILD_ALL_EXAMPLES=ON
export FF_BUILD_ALL_EXAMPLES=ON
export FF_BUILD_UNIT_TESTS=ON
else
export FF_BUILD_ALL_EXAMPLES=OFF
export FF_BUILD_UNIT_TESTS=OFF
fi

cd build
../config/config.linux
sudo make install
sudo ldconfig

- name: Check availability of Python flexflow.core module
if: ${{ matrix.gpu_backend == 'cuda' }}
run: |
export LD_LIBRARY_PATH="$CUDA_PATH/lib64/stubs:$LD_LIBRARY_PATH"
sudo ln -s "$CUDA_PATH/lib64/stubs/libcuda.so" "$CUDA_PATH/lib64/stubs/libcuda.so.1"
export CPU_ONLY_TEST=1
python -c "import flexflow.core; exit()"

- name: Run C++ unit tests
if: ${{ matrix.gpu_backend == 'cuda' }}
run: |
export CUDNN_DIR="$CUDA_PATH"
export CUDA_DIR="$CUDA_PATH"
export LD_LIBRARY_PATH="$CUDA_PATH/lib64/stubs:$LD_LIBRARY_PATH"
export FF_HOME=$(pwd)
sudo ln -s "$CUDA_PATH/lib64/stubs/libcuda.so" "$CUDA_PATH/lib64/stubs/libcuda.so.1"
cd build
./tests/unit/unit-test

- name: Check availability of flexflow modules in Python
run: |
if [[ "${FF_GPU_BACKEND}" == "cuda" ]]; then
export LD_LIBRARY_PATH="$CUDA_PATH/lib64/stubs:$LD_LIBRARY_PATH"
fi
# Remove build folder to check that the installed version can run independently of the build files
rm -rf build
python -c "import flexflow.core; import flexflow.serve as ff; exit()"

makefile-build:
name: Build FlexFlow with the Makefile
runs-on: ubuntu-20.04
Expand All @@ -134,11 +157,12 @@ jobs:
run: .github/workflows/helpers/free_space_on_runner.sh

- name: Install CUDA
uses: Jimver/cuda-toolkit@v0.2.11
uses: Jimver/cuda-toolkit@v0.2.16
id: cuda-toolkit
with:
cuda: "11.8.0"
cuda: "12.1.1"
use-github-cache: "false"
log-file-suffix: 'makefile_${{matrix.gpu_backend}}.txt'

- name: Install system dependencies
run: .github/workflows/helpers/install_dependencies.sh
Expand All @@ -147,7 +171,7 @@ jobs:
uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: flexflow
environment-file: conda/environment.yml
environment-file: conda/flexflow.yml
auto-activate-base: false

- name: Build FlexFlow
Expand All @@ -163,5 +187,4 @@ jobs:

cd python
make -j $n_build_cores
export CPU_ONLY_TEST=1
python -c 'import flexflow.core'
2 changes: 1 addition & 1 deletion .github/workflows/clang-format-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
- check: "src"
exclude: '\.proto$'
- check: "include"
- check: "nmt"
- check: "inference"
- check: "python"
- check: "scripts"
- check: "tests"
Expand Down
33 changes: 14 additions & 19 deletions .github/workflows/docker-build-skip.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,22 @@ concurrency:
cancel-in-progress: true

jobs:
docker-build:
name: Build and Install FlexFlow in a Docker Container
runs-on: ubuntu-20.04
docker-build-rocm:
name: Build and Install FlexFlow in a Docker Container (ROCm backend)
runs-on: ubuntu-latest
strategy:
matrix:
gpu_backend: ["cuda", "hip_rocm"]
cuda_version: ["11.1", "11.2", "11.3", "11.5", "11.6", "11.7", "11.8"]
# The CUDA version doesn't matter when building for hip_rocm, so we just pick one arbitrarily (11.8) to avoid building for hip_rocm once per number of CUDA version supported
exclude:
- gpu_backend: "hip_rocm"
cuda_version: "11.1"
- gpu_backend: "hip_rocm"
cuda_version: "11.2"
- gpu_backend: "hip_rocm"
cuda_version: "11.3"
- gpu_backend: "hip_rocm"
cuda_version: "11.5"
- gpu_backend: "hip_rocm"
cuda_version: "11.6"
- gpu_backend: "hip_rocm"
cuda_version: "11.7"
hip_version: ["5.3", "5.4", "5.5", "5.6"]
fail-fast: false
steps:
- run: 'echo "No docker-build required"'

docker-build-cuda:
name: Build and Install FlexFlow in a Docker Container (CUDA backend)
runs-on: ubuntu-latest
strategy:
matrix:
cuda_version: ["11.1", "11.6", "11.7", "11.8", "12.0", "12.1", "12.2"]
fail-fast: false
steps:
- run: 'echo "No docker-build required"'
Loading
Loading