Skip to content

v2.7.0-cktile

Latest
Compare
Choose a tag to compare
@rocking5566 rocking5566 released this 15 Nov 05:57
· 19 commits to main since this release
  • Reduce LDS usage when num_splits <= 8
  • Use smaller tile size to speed-up small seqlen cases
  • Fine-tune block mapping
  • Use larger vector size for writing workspace
  • Speed-up combine kernel
  • Fix block table read out-of-bound issue
  • Fix wrong key/value range in each splits
  • Not to access dropout seed & offset device pointer in the host api