Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable stacktrace collection per EventPipeSession #3696

Closed
ezsilmar opened this issue Feb 28, 2023 · 7 comments
Closed

Disable stacktrace collection per EventPipeSession #3696

ezsilmar opened this issue Feb 28, 2023 · 7 comments

Comments

@ezsilmar
Copy link
Contributor

ezsilmar commented Feb 28, 2023

Background

Currently, all events emitted from the managed space by EventSource will do a stackwalk, see eventpipeinternal.cpp:121. The obligatory stackwalk leads to significant performance penalty for frequent events while in some cases the stacktrace is not required.

As an example, I'm working on the association between CPU samples from perf record and spans of System.Diagnostics.Activity. The idea is to bring the profiling data on the application level, showing the cpu usage grouped by activity name. To make it work, I record all the activities with Microsoft-Diagnostics-DiagnosticSource and track TraceSyncrhonousWorkStart and TraceOperationStart events from System.Threading.Tasks.TplEventSource to follow the asynchronous flow.

With the stacktraces, the idea is not viable because the application spends more than half of the cpu time doing stackwalks. By disabling the stacktrace recording (just put false in the file linked above), the overhead drops down to around 3% showing really promising results.

Proposed Feature

I propose to introduce the bool collectStacks parameter when creating an EventPipe session. The change is required both in the runtime implementation and the DiagnosticsClient. For managed events the effect of session.collectStacks would be straightforward as there's no way to control it in EventSource definition. For the native we could check session.collectStacks && event.needStack to respect ClrEtwAllMeta.lst

Why I think introducing this parameter on the session level is a good idea:

  1. For the same event, the importance of the stacktrace depends on the usecase. Thus the user should be able to choose
  2. It seems too complex for me to propagate the flag per provider, even more complex per keyword or per event. On the other hand disabling the stacks with the environment variable (per process) will severely limit other usecases.
  3. If we need the events both with and without the stack at the same time, we could start 2 parallel sessions. Not very convenient but not too much trouble either.
  4. It's relatively easy to support backward compatibility as we don't change the defaults and the parameter could be optional

Usage Examples

Starting the session:

var session = client.StartEventPipeSession(providers, collectStacks = false);
@ezsilmar ezsilmar added the enhancement New feature or request label Feb 28, 2023
@tommcdon
Copy link
Member

tommcdon commented Mar 1, 2023

@davmason @noahfalk

@ezsilmar
Copy link
Contributor Author

ezsilmar commented Mar 2, 2023

I've just realized that RequestRundown could be the thing I need. I'll test it and close the issue if it's the case.

@davmason
Copy link
Member

davmason commented Mar 2, 2023

Hi @ezsilmar,

RequestRundown won't have any affect on stack collection. There is not currently a way to disable stack collection and it causes pain for some scenarios, we are in the process of adding options to disable stack collection. PR dotnet/runtime#82470 is about to be merged, and long term I would like to do the approach you describe where we can disable stack collection per session.

@ezsilmar
Copy link
Contributor Author

ezsilmar commented Mar 3, 2023

Hi @davmason, thanks for the clarification!

There's not much info on rundown events on the internet, as far as I understand it it's basically symbol information?

I was confused with RequestRundown because of these lines:

 if (event.NeedStack() && !session.RundownEnabled() && pStack == NULL)
 {
     EventPipe::WalkManagedStackForCurrentThread(s);
     pStack = &s;
 }

I read them incorrectly at first, assuming that we only collect stacks when we emit Rundown which would make sense to me. However I'm a bit lost as the logic is opposite, meaning probably there are two different ways to do a stackwalk? I couldn't find it in the code.

Anyway, I'm interested in contributing to this subject as I'm going to implement something fast for our custom-built runtime in the coming days.

@davmason
Copy link
Member

davmason commented Mar 3, 2023

Rundown is something we do to make sure everything in a trace can be resolved. Jitted methods are dynamically generated at runtime so their IPs are not contained in the symbol file, and we also can potentially have dynamically generated types, in memory modules, and other things that are not known in the symbol file.

When rundown is triggered we go through all the dynamically generated things and emit the proper information so anyone reading the trace can properly map everything. It is effectively iterating through all the method bodies, types, etc and emitting an event for each one.

We turn off stack information for rundown events as an optimization because it's not relevant, there is no reason for anyone to care what stack is emitting rundown events since it is always the same one.

@ezsilmar
Copy link
Contributor Author

ezsilmar commented Mar 3, 2023

Ah ok I see there's rundown_requested field that is set at session startup but then there's rundown_enabled that is set to true at the end of the session. So Rundown provider doesn't rely on event.need_stack to exclude the stacks but on the separate check in the code.

@tommcdon tommcdon added this to the 8.0.0 milestone Mar 21, 2023
davmason pushed a commit that referenced this issue Nov 21, 2023
Client-side of dotnet/runtime#84077 and the
implementation of #3696.

To simplify the interface I made `EventPipeSessionConfiguration` public
and introduced a new method in the DiagnosticsClient:
`Task<EventPipeSession>
StartEventPipeSessionAsync(EventPipeSessionConfiguration configuration,
CancellationToken token)`. This is the only method that supports
disabling the stackwalk so no additional overloads with a new bool
parameter and no synchronous counterpart. I believe it'd be easier to
use and maintain a single async method with the options rather than
creating more overloads or default parameters but I may not have all the
context here so please correct me if you think it's a bad idea.

To deal with the backward compatibility I only use `CollectTracingV3`
when necessary i.e. when `RequestStackwalk` option is set to false. I
think it's a good compromise between the added complexity and
potentially surprising behavior:
* when the client is old and the runtime is new everything works because
the runtime supports `CollectTracingV2`
* when the client is new but the runtime is old everything works until
the new option is used. When it's used the session won't start as
`CollectTracingV3` doesn't exist server side: there'd be no clear error
message but it's documented in the option summary.
* when both the client and the runtime are new either `CollectTracingV2`
or `CollectTracingV3` may be used transparently for the user
* we may use the same trick when we introduce `CollectTracingV4`

The alternative is to implement version negotiation of some sort but I'd
like to have your opinion before attempting this as handling the errors
correctly wouldn't be easy (f.e. in [my current
fork](criteo-forks@3946b4a#diff-e8365039cd36eae3dec611784fc7076be7dadeda1007733412aaaa63f40a748fR39)
I just hide the exception)

The testing turned out to be a bit complex as I needed to convert
EventPipe stream to `TraceLog` to be able to read the stacktraces. I
couldn't achieve that without writing data to a file. Afaiu the
stackwalk may not work correctly without the rundown that only happens
at the end of the session so I wonder if looking at the stacktraces with
a live session is even possible (though iirc netfw+ETW could do that
back in the days) ?

Thanks for your time !
@ezsilmar
Copy link
Contributor Author

I've just realized this should be closed as completed as the implementation was merged.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants