Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request: Faster build for modern windows machines #7155

Open
chaimav opened this issue Jul 26, 2024 · 7 comments
Open

Enhancement request: Faster build for modern windows machines #7155

chaimav opened this issue Jul 26, 2024 · 7 comments
Labels
scope: performance Performance issues and improvements

Comments

@chaimav
Copy link

chaimav commented Jul 26, 2024

Current Windows builds are not optimized for the number of threads for modern processors.
Manually setting the number of threads for Wavelets yields increased performance over the automatic setting.
Threads

A full explanation from HIRAM is here:
https://discuss.pixls.us/t/how-to-optimize-rawtherapee/44786/15

@Benitoite
Copy link
Contributor

Benitoite commented Jul 26, 2024

OP @chaimav has tested dev github build artifact for windows-release and and 5.10 offical windows release on rawtherapee.com.
On: Windows 11, i7 13700

@Benitoite
Copy link
Contributor

Benitoite commented Jul 27, 2024

For a fast windows 11 optimized build, I would recommend testing skylake-raptorlake: --march=skylake --mtune=raptorlake -O3 -flto

Note: LTO doesn’t seem to work for the windows build. See: #5379

Skylake represents the 8th gen intel architecture (minimum for windows 11), Raptor Lake is the 13th gen (including the specific i7 mentioned above). This tuning could be added as processor target number 11.

These 2017-2022 tuning architectures should be available in gcc-13 and later.

For a fast windows 10 build, existing processor target number 10 (sandybridge-ivybridge) could be used.

Example github CI build:

Workflow link:
https://github.com/Benitoite/RawTherapee/actions/runs/10122666106

about_this_build:

Version: nightly-github-actions-810-g2a8e549b7
Branch: fastwin
Commit: 2a8e549b7
Commit date: 2024-07-27
Compiler: cc 14.1.0
Processor: skylake-raptorlake
System: Windows
Bit depth: 64 bits
Gtkmm: 3.24.9
Lensfun: 0.3.4.0
libjxl: 0.10.3
Build type: release
Build flags:  -std=c++11 -ffp-contract=off -march=skylake -mtune=raptorlake -Werror=unused-label -Werror=delete-incomplete -fno-math-errno -Wno-attributes -Wall -Wuninitialized -Wcast-qual -Wno-deprecated-declarations -Wno-unused-result -Wunused-macros -fopenmp -Werror=unknown-pragmas -O3 -DNDEBUG -ftree-vectorize
Link flags:  -march=skylake -mtune=raptorlake
OpenMP support: ON
MMAP support: ON
Build OS: MINGW64_NT-10.0-20348 3.5.3-d8b21b8c.x86_64 x86_64
Build date: Sat, 27 Jul 2024 13:02:36 +0000 UTC
Build epoch: 1722085356
Build UUID: efe7a454-0778-4f9a-90bc-74f2f1a12109

Runs ok on Windows10 / i7-6700 (Skylake). I'm not the expert at generating timing comparison data. Just by seat-of-the-pants it does seem way faster.

@Lawrence37
Copy link
Collaborator

@chaimav There are two things going on here.

The first is optimal thread usage. There might be (just my speculation) a limit on the number of threads when it is set to automatic. If this is the case, manually setting the maximum threads to at least the number of logical cores you have could give you the best result. Based on the specs you provided in the Pixls thread, that would be 24. Be cautious about possible performance issues when using high values (see #6730), so some experimentation could be required.

The other thing is build optimization for more recent architectures. The official builds are generic, which means they work for a large percentage of computers (I'm only talking about x86). We could provide one or more optimized builds for more recent architectures. I'd like to see what the performance improvements are for various processor targets to determine the best compromise between performance and compatibility. We also have to think about AMD and how to make the optimized builds available without making it confusing for those who are not techies.

@Lawrence37 Lawrence37 added the scope: performance Performance issues and improvements label Jul 27, 2024
@chaimav
Copy link
Author

chaimav commented Jul 28, 2024

Is it possible for the installer to determine the CPU on install? Or even have a manual option during install that defaults to the current version if the user doesn't select another option?

@chaimav
Copy link
Author

chaimav commented Jul 28, 2024

I tried @Benitoite build, and still saw measurable improvement by increasing the number of threads
Using a previous edit that has Wavelets > Sharp-mask & clarity enabled and just panned side to side. Using a crude timing method (stopwatch on my smartphone) I did multiple pans.
With threads set to 0, the processing bar showed up for about 2.5 seconds, but with threads set to 16 it was there for about 1.4 seconds.

@Lawrence37
Copy link
Collaborator

I'm not sure if it's possible to detect the CPU architecture. I took a quick look at the Inno Setup documentation and didn't see anything that can help.

@Lawrence37
Copy link
Collaborator

The results from https://discuss.pixls.us/t/rawtherapee-windows-performance-testing-needed-powershell/44819 show very little difference between builds for different architectures in most test runs. There is one or two CPUs that show a preference for one build, but it's hard to draw conclusions without knowing the margin of error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: performance Performance issues and improvements
Projects
None yet
Development

No branches or pull requests

3 participants