Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using tracing #1515

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bendk
Copy link
Contributor

@bendk bendk commented Jan 19, 2024

Description

This follows @pjenvey's suggestion to start by switching the app-level logger and forward the slog records from the existing code to tracing events.

It's based on top of #1514 the tracing_actix_web_mozlog crate uses Actix 4.

@bendk
Copy link
Contributor Author

bendk commented Jan 19, 2024

My main motivation for pushing this one out is to use it as a discussion starter to ask some questions.

Q1: Are there requirements for the log record formatting? When I ran the server locally, it seemed to output the same data, but some of the field names changed, some things were formatted slightly differently and some new fields were added. Are the changes okay or should are there things that shouldn't change?

Current code, JSON format:

  {"Logger":"syncserver-0.14.4","Type":"syncserver:log","Hostname":"fedora","Pid":419763,"Severity":6,"Timestamp":1705678149057992749,"Fields":{"msg":"{\"ua.os.family\":\"Other\",\"ua.browser.ver  \":\"curl\",\"ua\":\"curl\",\"ua.browser.family\":\"Other\",\"uri.method\":\"GET\",\"ua.name\":\"HTTP Library\",\"uri.path\":\"/\",\"ua.os.ver\":\"UNKNOWN\"}"}}

New code, JSON format:

  {"Timestamp":1705681287289443786,"type":"<unknown>","Logger":"syncserver-0.14.4","Hostname":"fedora","EnvVersion":"2.0","Pid":448168,"Severity":5,"Fields":{"slog.column":13,"spans":"","slog.target":"syncserver::tokenserver::logging","slog.line":25,"slog.module_path":"syncserver::tokenserver::logging","message":"{\"ua.os.ver\":\"UNKNOWN\",\"ua.os.family\":\"Other\",\"ua.browser.ver\":  \"curl\",\"ua\":\"curl\",\"ua.name\":\"HTTP Library\",\"ua.browser.family\":\"Other\",\"uri.path\":\"/\",\"uri.method\":\"GET\"}","slog.file":"syncserver/src/tokenserver/logging.rs"}}

Current code, human format:

  Jan 19 15:27:40.202 INFO {"ua.name":"Firefox","ua.os.ver":"UNKNOWN","ua.os.family":"Linux","ua":"123.0","ua.browser.ver":"123.0","ua.browser.family":"Firefox","uri.path":"/","uri.method":"GET"}

New code, human format (now with some green coloring on my terminal):

  2024-01-19T16:14:26.758590Z  INFO slog: {"ua.name":"HTTP Library","ua.os.family":"Other","uri.method":"GET","ua.os.ver":"UNKNOWN","ua.browser.family":"Other","ua.browser.ver":"curl","ua":"curl"  ,"uri.path":"/"}, slog.target: "syncserver::tokenserver::logging", slog.module_path: "syncserver::tokenserver::logging", slog.file: "syncserver/src/tokenserver/logging.rs", slog.line: 25, slog.  column: 13  

Q2: do logs always output to stdout or is there some other transport I should test?

Q3: Should we add the request.summary events from tracing_actix_web_mozlog?

  {"Timestamp":1705689281848618996,"type":"request.summary","Logger":"syncserver-0.14.4","Hostname":"fedora","EnvVersion":"2.0","Pid":532652,"Severity":5,"Fields":{"t_ns":7328717,"t":7,"rid":"8d3c457a-0594-4fb8-ae85-3c017d664d9d","path":"/","agent":"curl/8.2.1","code":302,"spans":"request","method":"GET"}}

@bendk bendk force-pushed the start-using-tracing branch from f736218 to 5bbd383 Compare January 19, 2024 18:51
@jrconlin
Copy link
Member

jrconlin commented Jan 22, 2024

Q1

For now, I'll focus on the JSON logging output. "Human" is just that, and is designed to be human readable, so pretty much whatever format you want to use is fine, provided it's readable.

The JSON logs are consumed by several processes, so those should have a structured form. The current logging uses MozLog Services format which is a bit crufty, but explains some of the major key titles and roles. I note that the proposed version includes a new field, EnvVersion, and has type as lowercase for some reason. I'm also not 100% sure why we collapse the various Fields into a string called msg, but those should probably be broken back out into a proper structure.

We should also check to see if including the slog info in Fields causes a problem with anything that's ingesting Fields but that should be fairly straight-foward to address.

Q2

As far as the app is concerned, logs are output via stdout. The ingestion tools read those log lines. We would need to coordinate with SRE and the teams that use those logs to ensure that we are not introducing any bugs or potential alerts.

Q3

We should probably limit the items that we are writing to stdout, or at the very least, ensure that our RUST_LOG contains the crates we want logging messages for. I take it that the request.summary needs to be generated by the application? Could we place that under something like a feature flag to enable it only for specific hosts if we need it?

@bendk
Copy link
Contributor Author

bendk commented Jan 23, 2024

The JSON logs are consumed by several processes, so those should have a structured form. The current logging uses MozLog Services format which is a bit crufty, but explains some of the major key titles and roles. I note that the proposed version includes a new field, EnvVersion, and has type as lowercase for some reason. I'm also not 100% sure why we collapse the various Fields into a string called msg, but those should probably be broken back out into a proper structure.

I think EnvVersion is from the spec. I'll update type to match the old behavior.

I'm not sure what can be done for msg. Both the old and new log forwarders outputted that as a flat string and I think that's because slog doesn't give you any other option. Record allows you to get a BorrowedKV, which implements the KV trait. But it seems like the only thing you can do with that is serialize it, there's no way to iterate over the pairs.

We should also check to see if including the slog info in Fields causes a problem with anything that's ingesting Fields but that should be fairly straight-foward to address.

I think I'll just remove that. The plan is to remove slog anyways, so I don't think there's much reason to keep around the extra slog metadata.

I take it that the request.summary needs to be generated by the application? Could we place that under something like a feature flag to enable it only for specific hosts if we need it?

I'm thinking I should remove this too. The only reason I added it in the first place was the tracing_actix_web_mozlog docs recommended it. We can always add that later if needed and it should also be easy to put it behind a feature flag.

@bendk bendk force-pushed the start-using-tracing branch from 5bbd383 to f5e6b81 Compare January 26, 2024 14:45
This follows @pjenvey's suggestion to start by switching the app-level
logger and forward the slog records from the existing code to tracing
events.

Depends on:

- tracing-slog for slog forwarding
- tracing-appender for offloading log writes to a thread
- tracing-subscriber as a framework for event subscription/formatting

I hand-wrote the code to convert tracing events to MozLog entries, since
`tracing-actix-web-mozlog` didn't quite format them how we wanted.
@bendk bendk force-pushed the start-using-tracing branch from f5e6b81 to 526a010 Compare January 26, 2024 15:03
@bendk
Copy link
Contributor Author

bendk commented Jan 26, 2024

Pushed out a second pass at this one. This time I handled the serde / tracing integration directly to get more control over the final formatting. I think the logs are now exactly like they were before, except for some small differences that could be removed if needed:

  • The new logs have the EnvVersion field
  • The new logs use Fields.message rather than Fields.msg

I also added tracing-appender and used it to offload log writes to a separate thread.

Here's an example of a log line:

{"Timestamp":1706280719182205014,"Type":"syncserver:log","Logger":"syncserver-0.14.4","Hostname":"fedora","EnvVersion":"2.0","Pid":1581024,"Severity":5,"Fields":{"message":"{\"ua.browser.family\":\"Other\",\"uri.method\":\"GET\",\"ua.name\":\"HTTP Library\",\"ua.os.family\":\"Other\",\"ua.os.ver\":\"UNKNOWN\",\"ua.browser.ver\":\"curl\",\"ua\":\"curl\",\"uri.path\":\"/\"}"}}

Here's the prettified version:

{
  "Timestamp": 1706280719182205014,
  "Type": "syncserver:log",
  "Logger": "syncserver-0.14.4",
  "Hostname": "fedora",
  "EnvVersion": "2.0",
  "Pid": 1581024,
  "Severity": 5,
  "Fields": {
    "message": "{\"ua.browser.family\":\"Other\",\"uri.method\":\"GET\",\"ua.name\":\"HTTP Library\",\"ua.os.family\":\"Other\",\"ua.os.ver\":\"UNKNOWN\",\"ua.browser.ver\":\"curl\",\"ua\":\"curl\",\"uri.path\":\"/\"}"
  }
}

The Fields.message field is still an unstructured JSON string, but I figure that's okay since the current code is doing that and the plan is to move away from slog anyways. Related to that, it's wasteful that tracing-slog forwards all these slog fields, then the mozlog code needs to spend cycles throwing them away, but it seems okay to me if we're going to transition away from slog in the near future.

I converted the startup log message to using tracing directly, to test out how structured logs could look with that. Here's the prettified output:

{
  "Timestamp": 1706280839002892078,
  "Type": "syncserver:log",
  "Logger": "syncserver-0.14.4",
  "Hostname": "fedora",
  "EnvVersion": "2.0",
  "Pid": 1582817,
  "Severity": 5,
  "Fields": {
    "message": "Server starting up",
    "server": "http://127.0.0.1:8000 (mysql) No quota"
  }
}

This is feeling like it's close to correct to me, but I kind of wonder if we're going to need it. There doesn't seem to be much advantage over slog unless you start using spans and #1426 says there's no plans to do that. AFAIK, slog is not going to become obsolete anytime soon.

I personally don't see much use for spans other than simple stuff. It could be nice to group the records for a request together using a span, but it seems like you could do the same thing by adding a request_id field. This could definitely be a lack of imagination on my part though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants