r1.15.5-deeprec2208

liutongxuan released this 23 Sep 04:07

· 544 commits to main since this release

r1.15.5-deeprec2208

Major Features and Improvements

Embedding

Multi-tier of EmbeddingVariable support HBM, add async compactor in SSDHashKV.
Support tf.feature_column.shard_embedding_columns, SequenceCategoricalColumn and WeightedCategoricalColumn API for EmbeddingVariable.
Support save and restore checkpoint of GPU EmbeddingVariable.
Support EmbeddingVariable OpKernel with REAL_NUMBER_TYPES.
Support user defined default_value for feature filter.
Support feature column API for MultiHash.

Graph & Grappler Optimization

Add FP32 fused l2 normalize op and grad op and tf.nn.fused_layer_normalize API.
Add Concat+Cast fusion ops.
Optimize SmartStage performance on GPU.
Add macro to control to optimize mkl_layout_pass.
Support asynchronous embedding lookup.

Runtime Optimization

CPUAllocator, avoid multiple threads cleanup at the same time.
Support independent intra threadpool for each session and intra threadpool be pinned to cpuset.
Support multi-stream with virtual device.

Ops & Hardware Acceleration

Implement ApplyFtrl, ResourceApplyFtrl, ApplyFtrlV2 and ResourceApplyFtrlV2 GPU kernels.
Optimize BatchMatmul GPU kernel.
Integrate cuBLASlt into backend and use BlasLtMatmul in batch_matmul_op.
Support GPU fusion of matmal+bias+(activation).
Merge NV-TF r1.15.5+22.06.

Optimizer

Support AdamW optimizer for EmbeddingVariable.

Model Save/Restore

Support asynchronously restore EmbeddingVariable from checkpoint.
Support EmbeddingVariable in init_from_checkpoint.

Serving

Add go/java/python client SDK and demo.
Support GPU multi-streams in SessionGroup.
Support independent inter thread pool for each session in SessionGroup.
Support multi-tiered Embedding.
Support immutable EmbeddingVariable.

Quantization

Add low precision optimization tool, support BF16, FP16, INT8 for savedmodel and checkpoint.
Add embedding variable quantization.

ModelZoo

Optimize DIN's BF16 performance.
Add DCN & DCNv2 models and MLPerf recommendation benchmark.

Profiler

Add detail information for RecvTensor in timeline.

Dockerfile

Add ubuntu 22.04 dockerfile and images with gcc11.2 and python3.8.6.
Add cuda11.2, cuda11.4, cuda11.6, cuda11.7 docker images and use cuda 11.6 as default GPU image.

Environment & Build

Update default TF_CUDA_COMPUTE_CAPABILITIES to 6.0,6.1,7.0,7.5,8.0.
Upgrade bazel version to 0.26.1.
Support for building DeepRec on ROCm2.10.0.

BugFix

Fix build failures with gcc11 & gcc12.
StarServer, remove user packet split to avoid multiple user packet out-of-order issue.
Fix the 'NodeIsInGpu is not declare' issue.
Fix the placement bug of worker devices when distributed training in Modelzoo.
Fix out of range issue for BiasAddGrad op when enable AVX512.
Avoid loading invalid model when model update in serving.

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2208-cpu-py36-ubuntu18.04

GPU Image

alideeprec/deeprec-release:deeprec2208-gpu-py36-cu116-ubuntu18.04

Assets 2