-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
38 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Performance Optimizations | ||
## Intel Architecture Processors | ||
* Improved fp16/bf16 softmax performance with relaxed [accumulation mode](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_accumulation_mode.html#doxid-dev-guide-attributes-accumulation-mode). | ||
* Added support and improved perfomance for fp8 matmul with bf16/fp16. | ||
|
||
## Intel Graphics Products | ||
* Introduced initial optimizations for GPUs based on Xe3 architecture. | ||
* Improved performance for convolution for Intel Arc Graphics for Intel Core Ultra processors (Series 2) (formerly Lunar Lake). | ||
|
||
## AArch64-based Processors | ||
|
||
# Functionality | ||
* Introduced support for `select` algorithm in binary primitive. The functionality is optimized for Intel CPUs. | ||
* Enabled support for matmul primitive with grouped quantization on weight along N dimension | ||
* Introduced support for fp16/bf16 compressed weights in fp32 matmul on Intel CPUs. | ||
* Introduced support for grouped scales and zero points in reorder primitive. | ||
* Enabled support for 4d weight scale in matmul primitive. | ||
* [experimental] Extended microkernel API: | ||
Introduced int4 quantization support. | ||
Fpmath mode API | ||
# Usability | ||
* Relaxed memory object lifetime requirements created with CPU engine and SYCL runtime. New behavior is aligned with GPU engine. | ||
* Improve verbose diagnostic to better identify issues during dispatching, primitive and kernel creation for CPU primitive and GPU (in case of OpenCL implementation) primitive implementations. | ||
* Improve verbose diagnostic to simplify debugging of nGEN fallbacks. | ||
* Enabled frame pointers support on Intel64 platforms to improve integration with profilers. | ||
# Validation | ||
* Extended benchdnn with support and validation for fp8 matmul patterns for tensor tags in RNN primitive validation. | ||
# Deprecated Functionality | ||
|
||
# Breaking Changes | ||
* Updated minimal supported CMake version to 3.13 (was 2.8.12). | ||
* Updated minimal supported GCC version to 8.0 (was 4.8). | ||
* Updated minimal supported Clang version to 11.0 (was 3.0). | ||
# Thanks to these Contributors | ||
|
||
This release contains contributions from the [project core team] as well as Michał Górny @mgorny, Fadi Arafeh @fadara01, John Osorio @kala855, Ravi Pushkar @rpushkarr, Marek Michalowski @michalowski-arm, Renato Barros Arantes @renato-arantes, Ryo Suzuki @Ryo-not-rio, Varad Ahirwadkar @varad-ahirwadkar, Tadej Ciglarič @t4c1, Nikhil Sharma @nikhilfujitsu, @taoye9, @Shreyas-fuj, @raistefintel. We would also like to thank everyone who asked questions and reported issues. | ||
|
||
[project core team]: https://github.com/oneapi-src/oneDNN/blob/rls-v3.7/MAINTAINERS.md |