Skip to content

r1.15.5-deeprec2208

Compare
Choose a tag to compare
@liutongxuan liutongxuan released this 23 Sep 04:07
· 544 commits to main since this release
0fe2668

Major Features and Improvements

Embedding

  • Multi-tier of EmbeddingVariable support HBM, add async compactor in SSDHashKV.
  • Support tf.feature_column.shard_embedding_columns, SequenceCategoricalColumn and WeightedCategoricalColumn API for EmbeddingVariable.
  • Support save and restore checkpoint of GPU EmbeddingVariable.
  • Support EmbeddingVariable OpKernel with REAL_NUMBER_TYPES.
  • Support user defined default_value for feature filter.
  • Support feature column API for MultiHash.

Graph & Grappler Optimization

  • Add FP32 fused l2 normalize op and grad op and tf.nn.fused_layer_normalize API.
  • Add Concat+Cast fusion ops.
  • Optimize SmartStage performance on GPU.
  • Add macro to control to optimize mkl_layout_pass.
  • Support asynchronous embedding lookup.

Runtime Optimization

  • CPUAllocator, avoid multiple threads cleanup at the same time.
  • Support independent intra threadpool for each session and intra threadpool be pinned to cpuset.
  • Support multi-stream with virtual device.

Ops & Hardware Acceleration

  • Implement ApplyFtrl, ResourceApplyFtrl, ApplyFtrlV2 and ResourceApplyFtrlV2 GPU kernels.
  • Optimize BatchMatmul GPU kernel.
  • Integrate cuBLASlt into backend and use BlasLtMatmul in batch_matmul_op.
  • Support GPU fusion of matmal+bias+(activation).
  • Merge NV-TF r1.15.5+22.06.

Optimizer

  • Support AdamW optimizer for EmbeddingVariable.

Model Save/Restore

  • Support asynchronously restore EmbeddingVariable from checkpoint.
  • Support EmbeddingVariable in init_from_checkpoint.

Serving

  • Add go/java/python client SDK and demo.
  • Support GPU multi-streams in SessionGroup.
  • Support independent inter thread pool for each session in SessionGroup.
  • Support multi-tiered Embedding.
  • Support immutable EmbeddingVariable.

Quantization

  • Add low precision optimization tool, support BF16, FP16, INT8 for savedmodel and checkpoint.
  • Add embedding variable quantization.

ModelZoo

  • Optimize DIN's BF16 performance.
  • Add DCN & DCNv2 models and MLPerf recommendation benchmark.

Profiler

  • Add detail information for RecvTensor in timeline.

Dockerfile

  • Add ubuntu 22.04 dockerfile and images with gcc11.2 and python3.8.6.
  • Add cuda11.2, cuda11.4, cuda11.6, cuda11.7 docker images and use cuda 11.6 as default GPU image.

Environment & Build

  • Update default TF_CUDA_COMPUTE_CAPABILITIES to 6.0,6.1,7.0,7.5,8.0.
  • Upgrade bazel version to 0.26.1.
  • Support for building DeepRec on ROCm2.10.0.

BugFix

  • Fix build failures with gcc11 & gcc12.
  • StarServer, remove user packet split to avoid multiple user packet out-of-order issue.
  • Fix the 'NodeIsInGpu is not declare' issue.
  • Fix the placement bug of worker devices when distributed training in Modelzoo.
  • Fix out of range issue for BiasAddGrad op when enable AVX512.
  • Avoid loading invalid model when model update in serving.

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2208-cpu-py36-ubuntu18.04

GPU Image

alideeprec/deeprec-release:deeprec2208-gpu-py36-cu116-ubuntu18.04