diff --git a/beginners-guide.md b/beginners-guide.md index 17ade06ae80..db9f55c5fca 100644 --- a/beginners-guide.md +++ b/beginners-guide.md @@ -74,6 +74,16 @@ This is why the various Rust targets *don't* enable many CPU feature flags by de So please select an appropriate CPU feature level when building your programs. +## Some Guidelines about choosing CPU defaults +The default targets ensure the program works portably. However, there are cases in which you can increase the target features without compromising portability or if you already have higher requirements than Rust targets: +* Some Linux distributions such as RHEL 9 and Rocky Linux El9 require x86-64-v2. In this case, you can safely use `target-cpu=x86-64-v2` which allows up to sse4.2 (along with some other features such as `popcnt`) when building packages for these distributions. +* Newer distros (with Glibc 2.33 or later) support `glibc-hwcaps` directory that allow you to place optimized libraries. Instead of putting the libraries in the normal locations, you can create `glibc-hwcaps` directory at those locations, and it can have subdirectories such as `x86-64-v2` (supports up to `sse4.2`), `x86-64-v3` (supports up to `avx2`), and `x86-64-v4` (supports `avx512f`, `avx512bw`, `avx512cd`, `avx512dq`, and `avx512vl`) where you can place the libraries built with those target CPUs. When loading a library, Glibc will automatically load the most optimized library supported by the CPU. +* If you require Windows 11, you can also increase the CPU target to `x86-64-v2` +* macOS supports slices of different architecture in the same executable or library. During launch, macOS will launch the slice of the correct architecture (this is how Universal binaries of `x86-64` and `aarch64` work!). While `x86_64` and `arm64` (the internal name macOS uses for aarch64) are more commonly known, macOS also supports the x86_64**h** slice. This slice runs on Haswell or newer processors that support `avx2` and many other x86-64-v3 features. Rust has a tier 3 target that can build the slice called `x86_64h-apple-darwin` (notice the **h**) with `avx2` and other features already enabled. When it is time to combine the different slices, you can the `lipo` command to merge them into one file: + `lipo {input files} -o {output file}` +As an example, you may have a binary `foo` that has 3 versions built: `x86_64`, `x86_64h`, and `aarch64`. You can run `lipo foo-x86_64 foo-x86_64h foo-aarch64 -o foo` which will create an output file `foo` with all 3 slices. This supports most of the different SIMD CPU configurations of Macs (with the only exception of `avx512` extensions). +* If you are targeting newer versions of macOS, you can increase the `target-cpu` requirements. If you are targeting Mojave or later, you can bump up the `target-cpu` for x86_64 to `x86-64-v2` for up to `sse4.2` target-feature. If you are targeting Catalina or later, you can bump up the `target-cpu` for x86_64 to `ivybridge` which supports `avx` feature (and can optimize SSE operations as well). Finally, if you are targetting Ventura or later, you can omit the `x86_64` slice outright and use `x86_64h` slice (or even target `x86-64-v3` for the normal `x86_64` slice and omit `x86_64h` slice) as all x86_64 CPUs supported by Ventura support `avx2`. + ## Size, Alignment, and Unsafe Code Most of the portable SIMD API is designed to allow the user to gloss over the details of different architectures and avoid using unsafe code. However, there are plenty of reasons to want to use unsafe code with these SIMD types, such as using an intrinsic function from `core::arch` to further accelerate particularly specialized SIMD operations on a given platform, while still using the portable API elsewhere. For these cases, there are some rules to keep in mind.