You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are now in 2024 and many folks are using servers with a large number of cores (see also #8820). I understand that the next generation of TFB standard hardware has 56 logical cores. The trend is clear - more cores is the future.
To ensure that TFB test cases remain industry-relevant, some adjustments may be needed in the test cases to better target scenarios with large core counts. In particular, my experiments suggest that the low number of concurrent connections may be a bottleneck to stressing larger machines - with a large number of cores, there is not enough concurrent load to stress many frameworks and I observe that with a sampling of 4 frameworks, there seems to be an artificial plateau around the 25 core mark.
Fortunes at 512 concurrency. Physical cores, hyperthreading disabled.
My hypothesis is that this plateau is due to the relatively low number of concurrent clients used by TFB. I typically use 10K clients in typical load testing setups, so the Fortunes max concurrency of 512 raises an immediate flag for me.
I propose that future TFB rounds ramp up the concurrency considerably, for example taking the max concurrency of fortunes from 512 to 5120 (10x increase). This might allow large servers to be better exercised by ensuring that a sufficient volume of requests lands on the target.
In attempting to prototype an increase I ran into an obstacle: verification quickly starts to fail due to wrong database query count. There seem to be some bottlenecks that do not permit a larger number of verification queries to succeed.
Verification failed: Command '['siege', '-c', '5120', '-r', '1', 'http://10.0.0.7:8000/fortunes', '-R', '/FrameworkBenchmarks/toolset/databases/.siegerc']' timed out after 20 seconds
FAIL for http://10.0.0.7:8000/fortunes
Only 1024 executed queries in the database out of roughly 5120 expected.
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
FAIL for http://10.0.0.7:8000/fortunes
Only 12154 rows read in the database out of roughly 61440 expected.
See https://github.com/TechEmpower/FrameworkBenchmarks/wiki/Project-Information-Framework-Tests-Overview#specific-test-requirements
{'content-type': 'text/html; charset=utf-8', 'content-length': '1232', 'server': 'Axum', 'date': 'Sat, 23 Mar 2024 22:54:38 GMT'}
I do not know if this is a problem with the harness or with specific frameworks but testing a random sampling of 10+ frameworks showed them all mostly fail with similar verification errors, so I suspect something in the benchmark harness setup. Attempting to bump limits in various database-related configs did not yield success.
The text was updated successfully, but these errors were encountered:
Increasing concurrency to 1024 after fiddling with config files shows:
The benchmark itself becomes unstable (some verifications pass, some fail, randomly)
Absolute numbers increase (even at low core counts!)
Scaling curve remains very similar, peaking around 20-25 cores
Not sure if it proves or disproves anything specific but it definitely underlines that some upgrades are needed to ensure the benchmark results are believable at high core counts.
Exist other issues like this one (#5626 is an example).
We need to remember that this bench is run with that enterprise servers, but also we run it in the cloud.
And specially we need that run locally with only one computer in local for test, before we send it the PRs to this repo.
I understand the issue, but we need to remember that this benchmark is 11 years old, and that is changing with the time. The first bench was in 2013 with i7-2600k (2º generation) with 8Gb and in the cloud with EC2.
And still all results are important in any environment, and that normally the results are relative comparable using any servers. Also not everybody will have enterprise servers in production for their applications.
Do we really need 16 and 32 concurrent users actually, even testing in local, for json, db and fortunes tests. I think that not. And the best think is that we will have faster runs, as any test need ~20 sec extra with any framework.
And I really like to see the data table information, is very informative, but the majority of people never watch it.
Still 16 and 32 concurrency give very little information actually.
1024 in place of 16 and 32 seems Ok now. But for more, it'll need a bigger toolset change. Auto calculate (checking the servers resources) or manually have 3 configs for: local, faster and enterprise.
@NateBrady23 do you think that are still relevant 16 and 32 concurrent connections ? We will have faster runs without them.
We are now in 2024 and many folks are using servers with a large number of cores (see also #8820). I understand that the next generation of TFB standard hardware has 56 logical cores. The trend is clear - more cores is the future.
To ensure that TFB test cases remain industry-relevant, some adjustments may be needed in the test cases to better target scenarios with large core counts. In particular, my experiments suggest that the low number of concurrent connections may be a bottleneck to stressing larger machines - with a large number of cores, there is not enough concurrent load to stress many frameworks and I observe that with a sampling of 4 frameworks, there seems to be an artificial plateau around the 25 core mark.
Fortunes at 512 concurrency. Physical cores, hyperthreading disabled.
My hypothesis is that this plateau is due to the relatively low number of concurrent clients used by TFB. I typically use 10K clients in typical load testing setups, so the Fortunes max concurrency of 512 raises an immediate flag for me.
I propose that future TFB rounds ramp up the concurrency considerably, for example taking the max concurrency of
fortunes
from 512 to 5120 (10x increase). This might allow large servers to be better exercised by ensuring that a sufficient volume of requests lands on the target.In attempting to prototype an increase I ran into an obstacle: verification quickly starts to fail due to wrong database query count. There seem to be some bottlenecks that do not permit a larger number of verification queries to succeed.
I do not know if this is a problem with the harness or with specific frameworks but testing a random sampling of 10+ frameworks showed them all mostly fail with similar verification errors, so I suspect something in the benchmark harness setup. Attempting to bump limits in various database-related configs did not yield success.
The text was updated successfully, but these errors were encountered: