-
-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent results with MemoryDiagnoser between identical runs with Server GC #1913
Comments
In case it's relevant, here are some results without server GC. They are significantly different from either of the two runs above.
Here's a second run. Seems more consistent.
Note also that the performance is much worse, which is why I prefer measuring with server GC (to get more accurate performance results the way the project is actually used). But if that makes the GC statistics way off, I may have to reconsider. |
#1543 Issues with memory GC measurements were tracked down to tiered JIT. Can you try disabling tiered JIT and see if you get the same results? |
Sure, how do I disable it? Failed to find any info. |
Also, changelog for 0.13.1 (which I'm using) says that tiered JIT is disabled? https://benchmarkdotnet.org/changelog/full.html |
Hm, I just checked the PRs that are mentioned in that change, and it looks like it was only disabled for BDN's tests, not all benchmarks. https://github.com/dotnet/BenchmarkDotNet/pull/1747/files You should be able to make the same change in your project to disable it. @adamsitnik Is this correct, or am I missing something? |
Tried |
Both BenchmarkDotNet and GC use some heuristics, so the results can vary over runs. BenchmarkDotNet chooses the number of invocations per iteration and runs the benchmark until the results are stable. You can read more about it here: https://benchmarkdotnet.org/articles/guides/how-it-works.html In the example below it chooses to run benchmark 4 times per iteration (result of the pilot stage) and to finish after 16 workload iterations:
After all of that, it runs an additional iteration to get the GC stats. So between two different runs, BDN might perform a different amount of work and hence allocate more/less before it asks GC for the stats. This is important as GC has it's own set of heuristics that determine generation budget sizes and it may affect how frequently collections are performed.
I encourage you to write a simple Console app with Stopwatch and calls to |
Thanks, that is a helpful clarification. It answers the question I (somewhat implicitly) posed in the OP, which was "why is the GC stats slightly different between identical runs". However, it does not address what I see turned out to be the main issue (which is evident by comparing the results in the OP with those of the subsequent comment): Using Note that the "Allocated" results seem fine, even with tiered JIT. It's the "Gen X" results that are out of whack. |
Please create a small repro app and run it without BenchmarkDotNet for both Server and Workstation GC. I expect that the statistics will have similar differences. |
Ok, I'll try to do that tomorrow. For clarification: Since you already expect there is a similar difference, are you asking me to repro it just for confirmation that this is indeed an area where BenchmarkDotNet can be improved? Or is it the other way around, that a confirmation would mean that BenchmarkDotNet is not at fault? (Which would confuse me.) |
I am 99.9% sure that it's not BDN fault (it does always exactly the same thing no matter of GC mode). I just want to be 100% sure. |
Here is a repro solution. namespace Test
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open BenchmarkDotNet.Jobs
open BenchmarkDotNet.Configs
[<MemoryDiagnoser>]
type Benchmark () =
[<Benchmark>]
member _.Bench() =
Array.init 10000 string
module Program =
[<EntryPoint>]
let main _argv =
BenchmarkRunner.Run<Benchmark>(
DefaultConfig.Instance.AddJob(Job.Default.WithGcServer(false))
)
|> ignore
0 Results when using
Results when using
As you can see, the Some questions:
|
The Workstation and Server GC modes are just different. They use different generation budgets, different amount of GC threads etc. One of the effects of that is that they perform cleanup at different rates. The "Pro .NET Memory Management" book has entire chapter dedicated to this subject: https://learning.oreilly.com/library/view/pro-net-memory/9781484240274/html/430794_1_En_11_Chapter.xhtml
The results are correct. This is how GC works. BDN just reports the numbers provided by the Something like this: public static void Main(string[] args)
{
int iterationCount = int.Parse(args[0]);
int invocationCount = int.Parse(args[1]);
bool enforceGC = bool.Parse(args[2]); // true by default in BDN
for (int i = 0; i < iterationCount; i++)
{
Stopwatch stopwatch = Stopwatch.StartNew();
for (int j = 0; j < invocationCount; j++)
{
Benchmark();
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds / invocationCount);
if (enforceGC)
{
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
}
}
Console.WriteLine();
GC.Collect(); // cleanup the memory
int gen0before = GC.CollectionCount(0);
int gen1before = GC.CollectionCount(1);
int gen2before = GC.CollectionCount(2);
for (int j = 0; j < invocationCount; j++)
{
Benchmark();
}
int gen0after = GC.CollectionCount(0);
int gen1after = GC.CollectionCount(1);
int gen2after = GC.CollectionCount(2);
Console.WriteLine($"GC stats: {(gen0after - gen0before) / 1000.0} {(gen1after - gen1before) / 1000.0} {(gen2after - gen2before) / 1000.0}");
}
static void Benchmark() { } // what you want to benchmark
You should learn more about the GC flavors and choose the best for your production environment and then just keep using it for benchmarking.
It does in the same way as enabling Server GC for your app.
.NET GC uses the "stop the world" approach which means that for some parts of it's job (usually for memory compaction) it pauses all the active threads, performs some work and then re-enables the threads. This takes time which might be observed by the end user. In a desktop app it can be an unresponsive app, in a web app it can be increased latency. Personally I always look at the Allocated column and Gen 2. If some Gen 2 collections happened, it might mean that the code is allocating large objects (> 85k) and it might be issuing Full GC. |
Thank you, I understand more now. I use Server GC because the web API framework I'm developing (and benchmarking) uses the Hopac library for lightweight parallelization and As I understand your explanation and the actual results, in my case, Server GC simply collects a lot less (as returned by Thanks for the tips regarding usage of the memory results! |
The following two results are from two consecutive runs of the exact same code with the exact same BenchmarkDotNet configuration. Notice the differences in Gen 0 (1000 vs 2000 collects per 1k ops) and Gen 1 (0 vs 1000 collects per 1k ops).
I have noticed this a bit in my testing. Is this to be expected? What does it mean? How can I use MemoryDiagnoser effectively to get information (e.g. about GC pressure, I think; note that I know very little about GC) when the results seem to vary so significantly between runs?
(Note that this is from a complex project; unfortunately I don't have a minimal repro.)
It's also interesting to note the differences in the histograms between the runs, though I have no idea if that is relevant here.
Edit: Is it relevant that I am using server GC?
Job.Default.WithGcServer(true)
Run 1
Run 2
The text was updated successfully, but these errors were encountered: