Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeInformation causes unexpected allocations #101536

Open
timcassell opened this issue Apr 25, 2024 · 10 comments
Open

RuntimeInformation causes unexpected allocations #101536

timcassell opened this issue Apr 25, 2024 · 10 comments
Labels
area-VM-coreclr question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@timcassell
Copy link

timcassell commented Apr 25, 2024

Description

While debugging and attempting to fix flaky allocation measurements in BenchmarkDotNet (dotnet/BenchmarkDotNet#2562), I found that System.Runtime.InteropServices.RuntimeInformation causes allocations during a GC.Collect().

I only observe this behavior in net8.0 runtime. net48 runtime using Microsoft.DotNet.PlatformAbstractions nuget package does not reproduce (or AppDomain.MonitoringTotalAllocatedMemorySize just doesn't observe it, since GC.GetTotalAllocatedBytes is not available in net48).

Reproduction Steps

Create a Console application and run this code in Release with net8.0.

public static void Main()
{
    var bench = new Bench();
    var action = bench.Benchmark1;
    Warmup(action);

    // Switch these lines to observe 0 allocated bytes.
    var frameworkDescription = RuntimeInformation.FrameworkDescription;
    //var frameworkDescription = "";

    var initial = GetAllocatedBytes();
    action();
    var final = GetAllocatedBytes();

    Console.WriteLine($"allocated bytes: {final - initial}");

    GC.KeepAlive(frameworkDescription);
}

static long GetAllocatedBytes()
{
    GC.Collect(); // We only observe allocated bytes when GC.Collect() is called.
    return GC.GetTotalAllocatedBytes(precise: true);
}

static void Warmup(Action action)
{
    for (int i = 0; i < 100; i++)
    {
        action();
    }
}

public class Bench
{
    public void Benchmark1() => TestMethod();
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static ulong TestMethod()
    {
        var r = 1ul;
        for (var i = 0; i < 50_000_000; i++)
        {
            r /= 1;
        }
        return r;
    }
}

Expected behavior

Console prints allocated bytes: 0.

Actual behavior

Console prints allocated bytes: 336 (sometimes it prints 48 or 672).

Regression?

No response

Known Workarounds

No response

Configuration

Windows 10 x64
.Net 8.0

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Apr 25, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Apr 25, 2024

There are number of background threads running in your app: finalizer thread, tiered JIT thread, diagnostics (event pipe) thread, ... . All these threads can wake up, do work and allocate as side-effect. The allocations you are seeing are likely from these background threads waking up and allocating.

When I run your repro under debugger (that also affects the timing), I see some allocations from finalizers running on the finalizer thread while your benchmark is running in the foreground.

Adding/removing RuntimeInformation.FrameworkDescription call is changing the timing and the work that some of these background threads need to do. For example, adding the call can introduce more work for the tiered JIT thread.

@timcassell
Copy link
Author

@jkotas Sure, I can believe that we may get some random allocations from background threads we cannot control, but I don't believe that's what's happening here. If I measure it repeatedly, I see the same results repeatedly.

public static void Main()
{
    var bench = new Bench();
    var action = bench.Benchmark1;
    Warmup(action);

    // Switch these lines to observe 0 allocated bytes.
    var frameworkDescription = RuntimeInformation.FrameworkDescription;
    //var frameworkDescription = "";

    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");
    Console.WriteLine($"allocated bytes: {MeasureAllocations(action)}");

    GC.KeepAlive(frameworkDescription);
}

static long MeasureAllocations(Action action)
{
    var initial = GetAllocatedBytes();
    action();
    var final = GetAllocatedBytes();
    return final - initial;
}
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336
allocated bytes: 336

Without GC.Collect():

allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0
allocated bytes: 0

@jkotas
Copy link
Member

jkotas commented Apr 25, 2024

GC.Collect() triggers Gen2 GC. Gen2 GC triggers ArrayPool cleanup on finalizer thread. ArrayPool cleanup allocates ConditionalWeakTable enumerator at this callstack:

 # Child-SP          RetAddr               Call Site
00 000000ad`7007f178 00007ff7`d81d5630     coreclr!JIT_New [D:\a\_work\1\s\src\coreclr\vm\jithelpers.cpp @ 2435] 
01 000000ad`7007f180 00007ff7`d81d4c92     System_Private_CoreLib!System.Runtime.CompilerServices.ConditionalWeakTable`2[[System.__Canon, System.Private.CoreLib],[System.__Canon, System.Private.CoreLib]].System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<TKey,TValue>>.GetEnumerator()+0xfffffff7`da3d20f0
02 000000ad`7007f1e0 00007ff7`d81d49a9     System_Private_CoreLib!System.Buffers.SharedArrayPool`1[[System.Char, System.Private.CoreLib]].Trim()+0xfffffff7`da4fd382
03 000000ad`7007f2e0 00007ff8`37d4b846     System_Private_CoreLib!System.Gen2GcCallback.Finalize()+0xfffffff7`da6fc319
04 000000ad`7007f330 00007ff8`37c0e3cd     coreclr!FastCallFinalizeWorker+0x6 [D:\a\_work\1\s\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 26] 

@jkotas
Copy link
Member

jkotas commented Apr 25, 2024

(It is not the only alllocation that ArrayPool cleanup does.)

@timcassell
Copy link
Author

Hm, that makes sense. Can the allocations be eliminated? Or do you have any suggestions to avoid the issue?

@jkotas
Copy link
Member

jkotas commented Apr 25, 2024

I suppose it may be possible to eliminate it by somehow, but I am sure whether it is worth the troubles.

Many of the background runtime activities can allocate for number of different reasons. It does not make sense to try to eliminate every single one of them to make the benchmark.net alllocation analyzer reliable.

If you do not want the background activities to interfere, the best way is to disable them: disable tiered JIT, disable event source, hang the finalizer by creating a finalizer with infinite sleep, etc.

@cshung
Copy link
Member

cshung commented Apr 25, 2024

If you are okay with the overhead, The native ICorProfiler allows you to capture all allocations.

You can filter out allocation that does not happen on the thread of interest. We have a sample test case here to show how it could be done.

https://github.com/dotnet/runtime/blob/main/src/tests/profiler/native/gcallocateprofiler/gcallocateprofiler.cpp

@timcassell
Copy link
Author

timcassell commented Apr 25, 2024

I don't know what other allocations there are, but I suppose a low effort reduction there would be to use a struct enumerator for the ConditionalWeakTable (internal only for safety, of course).

If you do not want the background activities to interfere, the best way is to disable them: disable tiered JIT, disable event source, hang the finalizer by creating a finalizer with infinite sleep, etc.

Hanging the finalizer thread is certainly an interesting idea. I'll try to see if I can work that into our tests.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@AaronRobinsonMSFT AaronRobinsonMSFT added question Answer questions and provide assistance, not an issue with source code or documentation. and removed untriaged New issue has not been triaged by the area owner labels Apr 27, 2024
@mangod9 mangod9 added this to the Future milestone Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-VM-coreclr question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

6 participants