r1.15.5-deeprec2208
liutongxuan
released this
23 Sep 04:07
·
544 commits
to main
since this release
Major Features and Improvements
Embedding
- Multi-tier of EmbeddingVariable support HBM, add async compactor in SSDHashKV.
- Support tf.feature_column.shard_embedding_columns, SequenceCategoricalColumn and WeightedCategoricalColumn API for EmbeddingVariable.
- Support save and restore checkpoint of GPU EmbeddingVariable.
- Support EmbeddingVariable OpKernel with REAL_NUMBER_TYPES.
- Support user defined default_value for feature filter.
- Support feature column API for MultiHash.
Graph & Grappler Optimization
- Add FP32 fused l2 normalize op and grad op and tf.nn.fused_layer_normalize API.
- Add Concat+Cast fusion ops.
- Optimize SmartStage performance on GPU.
- Add macro to control to optimize mkl_layout_pass.
- Support asynchronous embedding lookup.
Runtime Optimization
- CPUAllocator, avoid multiple threads cleanup at the same time.
- Support independent intra threadpool for each session and intra threadpool be pinned to cpuset.
- Support multi-stream with virtual device.
Ops & Hardware Acceleration
- Implement ApplyFtrl, ResourceApplyFtrl, ApplyFtrlV2 and ResourceApplyFtrlV2 GPU kernels.
- Optimize BatchMatmul GPU kernel.
- Integrate cuBLASlt into backend and use BlasLtMatmul in batch_matmul_op.
- Support GPU fusion of matmal+bias+(activation).
- Merge NV-TF r1.15.5+22.06.
Optimizer
- Support AdamW optimizer for EmbeddingVariable.
Model Save/Restore
- Support asynchronously restore EmbeddingVariable from checkpoint.
- Support EmbeddingVariable in init_from_checkpoint.
Serving
- Add go/java/python client SDK and demo.
- Support GPU multi-streams in SessionGroup.
- Support independent inter thread pool for each session in SessionGroup.
- Support multi-tiered Embedding.
- Support immutable EmbeddingVariable.
Quantization
- Add low precision optimization tool, support BF16, FP16, INT8 for savedmodel and checkpoint.
- Add embedding variable quantization.
ModelZoo
- Optimize DIN's BF16 performance.
- Add DCN & DCNv2 models and MLPerf recommendation benchmark.
Profiler
- Add detail information for RecvTensor in timeline.
Dockerfile
- Add ubuntu 22.04 dockerfile and images with gcc11.2 and python3.8.6.
- Add cuda11.2, cuda11.4, cuda11.6, cuda11.7 docker images and use cuda 11.6 as default GPU image.
Environment & Build
- Update default TF_CUDA_COMPUTE_CAPABILITIES to 6.0,6.1,7.0,7.5,8.0.
- Upgrade bazel version to 0.26.1.
- Support for building DeepRec on ROCm2.10.0.
BugFix
- Fix build failures with gcc11 & gcc12.
- StarServer, remove user packet split to avoid multiple user packet out-of-order issue.
- Fix the 'NodeIsInGpu is not declare' issue.
- Fix the placement bug of worker devices when distributed training in Modelzoo.
- Fix out of range issue for BiasAddGrad op when enable AVX512.
- Avoid loading invalid model when model update in serving.
More details of features: https://deeprec.readthedocs.io/zh/latest/
Release Images
CPU Image
alideeprec/deeprec-release:deeprec2208-cpu-py36-ubuntu18.04
GPU Image
alideeprec/deeprec-release:deeprec2208-gpu-py36-cu116-ubuntu18.04