Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

erl_tracer for putting a span on anything anywhere #72

Open
tsloughter opened this issue Mar 16, 2018 · 17 comments
Open

erl_tracer for putting a span on anything anywhere #72

tsloughter opened this issue Mar 16, 2018 · 17 comments

Comments

@tsloughter
Copy link
Member

The open question I have for how to implement this is the fact we need to access the process dictionary from the erl_tracer NIF for the calling process so we can create the child span.

Hopefully this is possible, I just haven't done it before and simply checking the erl_nif docs for "process dictionary" had no results :)

Opening this in case anyone who sees it has thoughts on this feature overall or how to use the pdict (or some other way of getting the current span ctx when a trace point is triggered).

@tsloughter
Copy link
Member Author

With the restriction on a seq_trace label being lifted so any term can be used we can store the full census span context:

erlang/otp#1760

The problem still remains that for tracing function calls I don't know how to get the set label for that process from a NIF to use by erl_tracer. Only if it is triggered by a seq_trace does it receives the label.

@garazdawi
Copy link

I haven't checked, but it could be possible to get it from a match spec, given that the trace event can have a match spec.

@tsloughter
Copy link
Member Author

Ooh, that's right!

@tsloughter
Copy link
Member Author

Hm, but yea, not seeing how it would actually work.

@garazdawi
Copy link

Something like: dbg:tp(lists,seq,[{'_',[],[{message,{get_seq_token}}]}]). should put the current token in the match_spec_result part of the Opts map in erl_tracer. This should be possible to do for, call, receive and send trace events.

@tsloughter
Copy link
Member Author

Great! I still need to give this a try but will soon.

@tsloughter
Copy link
Member Author

Woo, it worked!

The only thing is that it is a little awkward because you have to setup the tracer twice (sort of):

    Tracer = spawn(fun F() ->
                           receive M ->
                                   io:format("T ~p~n",[M]),
                                   F()
                           end
                   end),

   %% setup seq_trace and erl_tracer
    seq_trace:set_system_tracer(Tracer),
    erlang:trace(all, true, [call,{tracer, oc_erl_tracer, Tracer}]),

    %% set span context
    seq_trace:set_token(label, {trace_id, span_id}),

    erlang:trace_pattern({lists, seq, 2}, [{'_',[],[{message,{get_seq_token}}]}], [local]),
    lists:seq(1,10).

When run:

1> oc_erl_tracer:run().
T {trace,<0.227.0>,{0,{trace_id,span_id},0,<0.227.0>,0}}
[1,2,3,4,5,6,7,8,9,10]

Also, based on the docs I would have thought seq_trace:set_system_tracer({oc_erl_tracer, Tracer}) to work but it throws a badarg exception with in erlang:system_flag/2.

@garazdawi
Copy link

Also, based on the docs I would have thought seq_trace:set_system_tracer({oc_erl_tracer, Tracer}) to work but it throws a badarg exception with in erlang:system_flag/2.

This seems to be because of a bug, it should be possible to use a nif as a seq tracer. Seems like someone (i.e. me) missed to test that it actually works...

@tsloughter
Copy link
Member Author

@garazdawi hehe, ah ok :).

Are there any existing docs/benchmarks showing the overhead of having seq_trace and an erl_tracer nif enabled? I know I'll have to do some measuring myself soon enough, and can't imagine it to be more overhead than what is required currently to do a trace (calling ocp which lookups in the pdict if there is a span context entry and replaces it with a new span id/parent id, usually within a try ... after).

@tsloughter
Copy link
Member Author

@garazdawi starting on this again and have noticed an issue. In order to calculate the duration of a traced function call I'm including return_trace in the trace_pattern. But, unlike the argument map passed to the tracer NIF the return map does not include the seq_trace token:

 {trace,{lists,seq,[1,10]},
         #{timestamp => timestamp,
           match_spec_result =>
               {0,
                {span_ctx,305649435036826690566783407565658891195,
                          2440085020182989540,1},
                0,<0.297.0>,0}},
         -576460748151849}


{trace,{lists,seq,2},
         #{timestamp => timestamp,extra => [1,2,3,4,5,6,7,8,9,10]},
         -576460748151822}

Is this on purpose or should it be included? Is there another way to get the duration if I don't care about the result of the traced function call?

@garazdawi
Copy link

It is on purpose. Its not (at the moment) possible to associate a match spec with a return trace.

BTW, a fix for seq erl_tracer bug will be part of OTP-21.

@tsloughter
Copy link
Member Author

@garazdawi ah, ok. Is there any plan to make that possible for OTP-21?

@garazdawi
Copy link

no, there is no plan to make it possible for OTP-21.

@tsloughter
Copy link
Member Author

Damn, ok. Thanks for the help.

Do you think there is any other route I should look down for being able to correlate the time the call returns with the seq trace label? The only thing I can think of is again if inside the erl_tracer NIF function I can read the label, like I'd do in Erlang with seq_trace:get_token(label). Or does this just need to wait for the match spec support?

@garazdawi
Copy link

hmm, I don't have any good ideas. I only have one bad idea; manage a stack per process in the trace receiver so that you can associate calls with returns and thus add the label to the return traces in that process.

Regarding match spec support in return traces; I think that you are the first that I have come across needing it :) so it has not been prioritized before. One of the ideas that I had when I created the erl_tracer API was that for some trace points (notably call/return) it would be possible to write the code in Erlang. This would also solve your problem and for me that would be a better solution, and possibly not all that more complex to implement.

@tsloughter
Copy link
Member Author

Yea, being able to write the code in Erlang would be perfect.

@hauleth
Copy link
Contributor

hauleth commented Dec 7, 2018

There can be a problem there with using erlang:trace/3 and dbg module (which itself is wrapper over erlang:trace/3 NIF): you can have only one tracer per process, so if we use this, then it will make debugging of such processes impossible.

Thanks @ferd for pointing it out on Elixir forum.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants