-
While this is very crude solution, in no case use it for multiplayer games, principle hooking functions is used as in cheats.
-
Use only
Release
builded DLL for injection! Performance Debug configuration will spoil NUMAyei scheduler.
I watched one video How Bad is This $10,000 PC from 10 Years Ago??
from YouTube Channel (@LinusTechTips) and it became funny to me that they tried to compare launch Crysis 2 on server with x4 GPU SLI and did not see difference in fps metrics. But they did not realize that emphasis was on one CPU because game created threads for one processor, and not for both, so FPS did not change whether new 1x GPU or old x4 GPU SLI. Also, most likely game is made only for maximum 2x SLI, but no more, again, I actually have plans to fix this, in theory, I can make hack to unlock SLI scale GPUs. I hope that @nharris-lmg will be able to notify LTT team about this and re-test Crysis 2 on old NUMA server motherboard using this hack. Not to forget to enable NUMA feature in bios!
Moment from video:
- Hook and rewrite VirtualAlloc to VirtualAllocExNuma for each NUMA node
- Hook any OpenProcess and CreateProcess for migrate to other NUMA nodes
- Hook any method detect cores and threads
- Hook open process as double-click or context menu right-click for non-PRO users
not to run cmd.exe or powershell(in futurenumayei.exe ./binary_non_numa.exe
)
This will not make sense if running program does not initially separate WinAPI threads using CreateThread()
and similar functions.
Mechanism load balancer is simple, most programs do not work and do not call functions specifying NUMA node or group processors, so by default very first one is selected (only NUMA 0 or only NUMA 1).
If we inject NUMAyei scheduler in app we can redefine all functions to assign to each NUMA node in the system and using NUMA allocator.
I strictly remind you that programs that are already optimized for NUMA will not give any strong performance profit. Examples that I tested: XMRig miner, some benchmarks like Corona, Blender, etc.
It is desirable to have some kind text database on Wiki Github where there info compatibility and it will increase performance as percentage.
No one forces you to do this, but you will notice how power consumption CPUs will decrease, as well as their performance will increase, since for one NUMA node to work. With my thread scheduler, you have fully load system on fully worked 100% !
Windows scheduler has to allocate the highest frequency to calculate task faster, while second node is idle, performing background tasks unrelated to main working desired process used.
- Minimum version Windows 11 (I will try to make version lower)
- Any DLL Injector
- Builded NUMADLL.dll (once I set up Github CI, you can download latest binaries from )
- Windows 11 Pro 23H2 [22631.3296]
- 2 NUMA nodes (dual socket)
- Xeon E5 v3-v4 family CPU
- Download any DLL Injector, I advise you to take an opensource. (im tested on Xenos64 Injector)
- Select compiled
NUMADLL.dll
and inject in any process (any methods) - In future
NUMA.exe
it will be an injector and through its parameter you can specify path to running exe or active running process by PID.
Good example below in screenshot using NUMAyei with running binary not NUMA-aware adapting.
Updated with new NUMA allocation (beginning with commit 6c6fb5)
In new CPU-Z version: