Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sentry-tracing): switch sentry spans on enter and exit #686

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

saiintbrisson
Copy link

@saiintbrisson saiintbrisson commented Sep 25, 2024

I've been facing a bug with Sentry Trace where concurrent spans end up wrongly nested. At first, I thought this was some strange autogrouping bug and event sent a feedback (sorry Sentry support member 😅).

After digging a little bit, I discovered that the current tracing layer implementation sets the sentry span as active as soon as a tracing span is created, which should not be the case, as spans are only active once entered. Following the same logic, it also reverts to the parent span only when a span is closed, and not when it's exited.

A span can be entered and exited multiple times, for example, when a tracing Instrumented future polls and returns Pending. In case of concurrent Instrumented futures, the current impl ends up nesting Futures under the first created one.

sentry-tracing layer (wrong)

Screenshot 2024-09-25 at 14 22 57

tracing-opentelemetry layer (correct)

Screenshot 2024-09-25 at 14 28 11

I'd say that the best idea would be to not use sentry_core::configure_scope(|s| s.get_span()); when creating a new span to get the parent, and get it from the span extensions instead, but the current behavior might be relied upon already by someone.

@Swatinem
Copy link
Member

In case of concurrent Instrumented futures, the current impl ends up nesting Futures under the first created one.

For those cases, one should usually fork the Sentry Hub in order to have them fully isolated.

While you are right that enter/exit is a better fit, and when someone is manually hooking up spans, this distinction can lead to some confusion.
However, Hub isolation is the proper solution to this. Hooking into enter/exit in order to set/unset the active Span in the Sentry scope would not fully fix the problem of concurrently running futures.

@szokeasaurusrex
Copy link
Member

Hi @saiintbrisson, are you able to solve this problem with hub isolation?

I would honestly be a bit hesitant to change how we enter and exit Sentry spans, since others might be relying on the current behavior. But, we can look into it if you are unable to find a way to work around this problem.

@saiintbrisson
Copy link
Author

saiintbrisson commented Oct 2, 2024

Hi @saiintbrisson, are you able to solve this problem with hub isolation?

I would honestly be a bit hesitant to change how we enter and exit Sentry spans, since others might be relying on the current behavior. But, we can look into it if you are unable to find a way to work around this problem.

I ended up just using my fork at my company. Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand. I already isolate it for my root future (I have a Axum middleware binding a hub to the handler call).

The problem arises on child futures, like join!ing two function futures. If both functions are tracing::instrumented, the first one will create a span when first polled, and when the second one is polled, its span will be considered a child to the first function, because the first span will only end after it returns Poll::Ready, which seems objectively wrong to me, as shown in the prints. That is, tracing::Span::current is not the same one as sentry::configure_scope(|s| s.get_span()), and I find this rather confusing.

Probably manually binding a hub to every call inside my join would work, but I don't want to make my coworkers handle this kind of detail during development. We have a couple of hundreds of mentions of try_join! on our code, and general code is agnostic to our telemetry stack (everything is handled through tracing).

If this happens to be a wont-fix, I'll probably just create my own tracing layer with this impl, but I do think sentry-tracing would benefit from this fix.

@Swatinem
Copy link
Member

Swatinem commented Oct 2, 2024

Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand.

Yes that is a bit unfortunate, but ultimately the correct thing to do. Apart from proper span hierarchy, this also relates to Sentrys Scope and other concepts, which could go out of sync if multiple concurrent futures are messing with them.

@saiintbrisson
Copy link
Author

saiintbrisson commented Oct 2, 2024

Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand.

Yes that is a bit unfortunate, but ultimately the correct thing to do. Apart from proper span hierarchy, this also relates to Sentrys Scope and other concepts, which could go out of sync if multiple concurrent futures are messing with them.

I don't think I understand how Scopes are to be used. I currently create a new Hub, create a transaction and set it to the created Hub, that I than use to bind it to the handler call (one scope per request). The way you described it, it seems like every future should have its own Scope?

In any case, I expected sentry-tracing to behave like what tracing-subscriber defines, thus my confusion as someone familiar with other impls (like tracing-opentelemetry). Because I use the tracing layer, I was wishing I could forget sentry Scopes altogether in favor of tracing spans/events/instrumentation. The current behavior should probably be described in the README for the crate: https://github.com/getsentry/sentry-rust/tree/master/sentry-tracing#tracing-spans, it's not clear that running these futures concurrently will provoke unexpected (faulty?) results.

Let me know if you folks come to any conclusion about this PR. For now, I'll create my own layer impl and move the fn get_span_idto #687.

Thanks for your time and keep up the good work, we've been enjoying Sentry a lot around here :).

@Swatinem
Copy link
Member

Swatinem commented Oct 3, 2024

All in all, thank you for the great feedback, this is very valuable.

I don't think I understand how Scopes are to be used. I currently create a new Hub, create a transaction and set it to the created Hub, that I than use to bind it to the handler call (one scope per request). The way you described it, it seems like every future should have its own Scope?

I might have added a bit to the confusion here. The Rust SDK still has a couple of concepts that have since been removed from the wider "Sentry unified API". For example, there are no Hubs anymore, and no Transactions either.
There is just Scopes and Spans. Though the Rust SDK is unfortunately quite behind on all of this :-(

Anyhow, in essence you want every future to have its own Scope (which in the Rust SDK is still named Hub) as you want to have isolation between them, for example:

  • When you set tags like file_id, you want those to be unique to each concurrently running future, as any error being logged should have the correct file_id.
  • Similarly, log messages which are being converted to breadcrumbs should also be isolated between concurrent futures. Otherwise a log line like "Starting to process file X" would be intermingled with other breadcrumbs unrelated to the operation.

The #[tracing] macro conveniently wraps the resulting future in Instrumented as you noted, which makes sure to push/pop the currently active span to a thread local. Sentry is completely outside of that, so you would have to do the equivalent for the Sentry Hub manually. Which then does the same management of a thread-local with appropriate state.

As I mentioned, the Rust SDK is quite a bit behind in terms of features. By now, the Sentry JS SDK has fully embraced opentelemetry, the SDK and instrumentation side that is. I could maybe imagine Sentry doing something similar in the sense of not duplicating all this span tracking, etc, but fully embracing tracing and build on top of that. Though that would be a ton of work. :-D

@nrxus
Copy link
Contributor

nrxus commented Oct 3, 2024

I think this is useful even outside of Futures.

My application doesn't do any async but whenever we create a new thread we also create a new tracing span for that thread. The span is created before the new thread (to inherit the current span as the parent) but then only enter it inside the new thread. If I am understanding this issue correctly, currently any events between the creation of the span and the entering of the span will have the wrong span information. While I understand this is a change in behavior I think the current behavior is fairly unexpected and not intuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants