fix(sentry-tracing): switch sentry spans on enter and exit #686

saiintbrisson · 2024-09-25T17:56:08Z

I've been facing a bug with Sentry Trace where concurrent spans end up wrongly nested. At first, I thought this was some strange autogrouping bug and event sent a feedback (sorry Sentry support member 😅).

After digging a little bit, I discovered that the current tracing layer implementation sets the sentry span as active as soon as a tracing span is created, which should not be the case, as spans are only active once entered. Following the same logic, it also reverts to the parent span only when a span is closed, and not when it's exited.

A span can be entered and exited multiple times, for example, when a tracing Instrumented future polls and returns Pending. In case of concurrent Instrumented futures, the current impl ends up nesting Futures under the first created one.

`sentry-tracing` layer (wrong)

`tracing-opentelemetry` layer (correct)

I'd say that the best idea would be to not use sentry_core::configure_scope(|s| s.get_span()); when creating a new span to get the parent, and get it from the span extensions instead, but the current behavior might be relied upon already by someone.

Swatinem · 2024-09-30T08:35:51Z

In case of concurrent Instrumented futures, the current impl ends up nesting Futures under the first created one.

For those cases, one should usually fork the Sentry Hub in order to have them fully isolated.

While you are right that enter/exit is a better fit, and when someone is manually hooking up spans, this distinction can lead to some confusion.
However, Hub isolation is the proper solution to this. Hooking into enter/exit in order to set/unset the active Span in the Sentry scope would not fully fix the problem of concurrently running futures.

szokeasaurusrex · 2024-10-01T12:40:59Z

Hi @saiintbrisson, are you able to solve this problem with hub isolation?

I would honestly be a bit hesitant to change how we enter and exit Sentry spans, since others might be relying on the current behavior. But, we can look into it if you are unable to find a way to work around this problem.

saiintbrisson · 2024-10-02T01:31:04Z

Hi @saiintbrisson, are you able to solve this problem with hub isolation?

I would honestly be a bit hesitant to change how we enter and exit Sentry spans, since others might be relying on the current behavior. But, we can look into it if you are unable to find a way to work around this problem.

I ended up just using my fork at my company. Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand. I already isolate it for my root future (I have a Axum middleware binding a hub to the handler call).

The problem arises on child futures, like join!ing two function futures. If both functions are tracing::instrumented, the first one will create a span when first polled, and when the second one is polled, its span will be considered a child to the first function, because the first span will only end after it returns Poll::Ready, which seems objectively wrong to me, as shown in the prints. That is, tracing::Span::current is not the same one as sentry::configure_scope(|s| s.get_span()), and I find this rather confusing.

Probably manually binding a hub to every call inside my join would work, but I don't want to make my coworkers handle this kind of detail during development. We have a couple of hundreds of mentions of try_join! on our code, and general code is agnostic to our telemetry stack (everything is handled through tracing).

If this happens to be a wont-fix, I'll probably just create my own tracing layer with this impl, but I do think sentry-tracing would benefit from this fix.

Swatinem · 2024-10-02T07:35:46Z

Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand.

Yes that is a bit unfortunate, but ultimately the correct thing to do. Apart from proper span hierarchy, this also relates to Sentrys Scope and other concepts, which could go out of sync if multiple concurrent futures are messing with them.

saiintbrisson · 2024-10-02T16:30:33Z

Hub isolation is pretty verbose as I would have to bind it to every future as it currently stand.

Yes that is a bit unfortunate, but ultimately the correct thing to do. Apart from proper span hierarchy, this also relates to Sentrys Scope and other concepts, which could go out of sync if multiple concurrent futures are messing with them.

I don't think I understand how Scopes are to be used. I currently create a new Hub, create a transaction and set it to the created Hub, that I than use to bind it to the handler call (one scope per request). The way you described it, it seems like every future should have its own Scope?

In any case, I expected sentry-tracing to behave like what tracing-subscriber defines, thus my confusion as someone familiar with other impls (like tracing-opentelemetry). Because I use the tracing layer, I was wishing I could forget sentry Scopes altogether in favor of tracing spans/events/instrumentation. The current behavior should probably be described in the README for the crate: https://github.com/getsentry/sentry-rust/tree/master/sentry-tracing#tracing-spans, it's not clear that running these futures concurrently will provoke unexpected (faulty?) results.

Let me know if you folks come to any conclusion about this PR. For now, I'll create my own layer impl and move the fn get_span_idto #687.

Thanks for your time and keep up the good work, we've been enjoying Sentry a lot around here :).

Swatinem · 2024-10-03T07:34:13Z

All in all, thank you for the great feedback, this is very valuable.

I don't think I understand how Scopes are to be used. I currently create a new Hub, create a transaction and set it to the created Hub, that I than use to bind it to the handler call (one scope per request). The way you described it, it seems like every future should have its own Scope?

I might have added a bit to the confusion here. The Rust SDK still has a couple of concepts that have since been removed from the wider "Sentry unified API". For example, there are no Hubs anymore, and no Transactions either.
There is just Scopes and Spans. Though the Rust SDK is unfortunately quite behind on all of this :-(

Anyhow, in essence you want every future to have its own Scope (which in the Rust SDK is still named Hub) as you want to have isolation between them, for example:

When you set tags like file_id, you want those to be unique to each concurrently running future, as any error being logged should have the correct file_id.
Similarly, log messages which are being converted to breadcrumbs should also be isolated between concurrent futures. Otherwise a log line like "Starting to process file X" would be intermingled with other breadcrumbs unrelated to the operation.

The #[tracing] macro conveniently wraps the resulting future in Instrumented as you noted, which makes sure to push/pop the currently active span to a thread local. Sentry is completely outside of that, so you would have to do the equivalent for the Sentry Hub manually. Which then does the same management of a thread-local with appropriate state.

As I mentioned, the Rust SDK is quite a bit behind in terms of features. By now, the Sentry JS SDK has fully embraced opentelemetry, the SDK and instrumentation side that is. I could maybe imagine Sentry doing something similar in the sense of not duplicating all this span tracking, etc, but fully embracing tracing and build on top of that. Though that would be a ton of work. :-D

nrxus · 2024-10-03T20:27:33Z

I think this is useful even outside of Futures.

My application doesn't do any async but whenever we create a new thread we also create a new tracing span for that thread. The span is created before the new thread (to inherit the current span as the parent) but then only enter it inside the new thread. If I am understanding this issue correctly, currently any events between the creation of the span and the entering of the span will have the wrong span information. While I understand this is a change in behavior I think the current behavior is fairly unexpected and not intuitive.

saiintbrisson force-pushed the master branch 3 times, most recently from 7a4b72c to a138e3c Compare September 27, 2024 01:26

saiintbrisson mentioned this pull request Sep 27, 2024

feat(sentry-core): add support for w3c traceparent #687

Draft

saiintbrisson force-pushed the master branch from a138e3c to 3636ca1 Compare September 27, 2024 02:11

saiintbrisson marked this pull request as draft September 27, 2024 10:04

saiintbrisson force-pushed the master branch 2 times, most recently from a138e3c to 7a4b72c Compare September 27, 2024 10:07

fix(sentry-tracing): switch sentry spans on enter and exit

6f19f1d

saiintbrisson force-pushed the master branch from 7a4b72c to 6f19f1d Compare September 27, 2024 10:14

saiintbrisson marked this pull request as ready for review September 27, 2024 10:14

szokeasaurusrex self-requested a review September 30, 2024 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sentry-tracing): switch sentry spans on enter and exit #686

fix(sentry-tracing): switch sentry spans on enter and exit #686

saiintbrisson commented Sep 25, 2024 •

edited

Loading

Swatinem commented Sep 30, 2024

szokeasaurusrex commented Oct 1, 2024

saiintbrisson commented Oct 2, 2024 •

edited

Loading

Swatinem commented Oct 2, 2024

saiintbrisson commented Oct 2, 2024 •

edited

Loading

Swatinem commented Oct 3, 2024

nrxus commented Oct 3, 2024

fix(sentry-tracing): switch sentry spans on enter and exit #686

Are you sure you want to change the base?

fix(sentry-tracing): switch sentry spans on enter and exit #686

Conversation

saiintbrisson commented Sep 25, 2024 • edited Loading

sentry-tracing layer (wrong)

tracing-opentelemetry layer (correct)

Swatinem commented Sep 30, 2024

szokeasaurusrex commented Oct 1, 2024

saiintbrisson commented Oct 2, 2024 • edited Loading

Swatinem commented Oct 2, 2024

saiintbrisson commented Oct 2, 2024 • edited Loading

Swatinem commented Oct 3, 2024

nrxus commented Oct 3, 2024

saiintbrisson commented Sep 25, 2024 •

edited

Loading

`sentry-tracing` layer (wrong)

`tracing-opentelemetry` layer (correct)

saiintbrisson commented Oct 2, 2024 •

edited

Loading

saiintbrisson commented Oct 2, 2024 •

edited

Loading