Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WOR-1544] #major Populate MDC for flight and its steps #125

Merged
merged 10 commits into from
Mar 4, 2024

Conversation

okotsopoulos
Copy link
Contributor

@okotsopoulos okotsopoulos commented Feb 29, 2024

https://broadworkbench.atlassian.net/browse/WOR-1544

Centralizing work done in DataBiosphere/terra-workspace-manager#1626, wrap flight thread execution so that flight threads are context-aware.

  • Logs emitted during a flight thread now have the context from the calling thread, along with flight-specific context (flight ID and class).
  • Logs emitted during the execution of a step additionally include step-specific context (step class, direction, and index).

Handles the case where a flight thread spawns a child flight by intentionally cleaning any leftover step-specific context from the calling thread.

In Action

I published my Stairway branch locally, pulled it in as a WSM dependency, removed the existing WSM code which modifies the MDC, deployed it to my Bee, and hit an endpoint that triggers a stairway flight.

The resulting logs have their MDC populated as described in this PR: https://cloudlogging.app.goo.gl/T9vdJ86RXRoa117VA

  • Calling thread's context over the entirety of the flight (that's where jsonPayload.requestId comes from)
  • Flight-specific context over the entirety of the flight
  • Step-specific context over the entirety of each step

A jsonPayload for a log emitted during a step:
Screenshot 2024-02-29 at 7 44 47 PM

All of the logs for the request, with flightId, flightStepNumber, and flightStepDirection added to the summary line:
Screenshot 2024-02-29 at 7 46 57 PM

(As an aside, it's clear to me that these log lines can be substantially trimmed. Do we need the full formatted summary anymore if we have this info in the MDC, and can pull it in the summary line?)

How can my service get this new functionality?

Once this PR is merged, I'll update the dependency in TCL, which is where most services get Stairway from. At any rate, downstream services will need to update a dependency.

If downstream services presently modify their MDCs themselves, they will be able to remove that code in their service and let Stairway take care of it.

Wrap flight thread execution so that flight threads are context-aware.
Logs emitted during a flight thread now have the context from the calling thread, along with flight-specific context (flight ID and class).
Logs emitted during the execution of a step additionally include step-specific context (step class, direction, and index).
Handles the case where a flight thread spawns a child flight by intentionally cleaning any leftover step-specific context from the calling thread.
Drive-by: now that we include flight ID and class on all flight logs in the MDC, we can trim down the dismal failure log line.
@okotsopoulos okotsopoulos changed the title Okotsopo wor 1544 mdc [WOR-1544] Populate MDC for flight and its steps Feb 29, 2024
In a follow-on PR, will automate spotlessCheck to gate PRs.
flightContext.getFlightClassName(),
flightContext.getStepIndex(),
flightContext.getStepClassName());
"{} (index {}) experienced DISMAL FAILURE: non-retryable error on undo",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: now that we include flight ID and class on all flight logs in the MDC, we can trim down the dismal failure log line.

I also reformatted to lead with the failing step class based off of Stairway retro feedback that all of the useful information on a log line is at the end. Do we like this better? Unfortunately we don't get step context in MDC for free at this point -- we're outside of step execution.

And some other formatting and documentation changes.
The MdcHelper methods are only meant to be called within the Stairway implementation package.
Stairway leverages the underlying logging system's mapped diagnostic context (MDC) if available.

MDC manages contextual information on a per-thread basis: as Stairway submits a Flight for processing,
it **passes along the MDC of the calling thread** so that context isn't lost. It also further populates
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagating the MDC from the calling thread makes me want to designate this as a minor version bump at minimum. Default behavior in this repo is patch bumps.

Reviewers, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with a more significant version bump. Though with all the dependency upgrades underway, I was already thinking we would do a major version bump very soon. Perhaps just do a major bump with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll make it a major one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #major to the PR title, which will get pulled into the commit message.

@okotsopoulos okotsopoulos marked this pull request as ready for review March 1, 2024 00:52
@@ -16,8 +16,7 @@ dependencies {
implementation group: 'org.slf4j', name: 'slf4j-api', version: '1.7.36'

// For testing
testImplementation group: 'org.junit.jupiter', name: 'junit-jupiter-api', version: '5.9.0'
testImplementation group: 'org.junit.jupiter', name: 'junit-jupiter-engine', version: '5.9.0'
testImplementation group: 'org.junit.jupiter', name: 'junit-jupiter', version: '5.10.2'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed this expanded dependency for parameterized tests, and upgraded while in the neighborhood.

Map<String, String> contextMap = MDC.getCopyOfContextMap();
return () -> {
// Any leftover context on the thread will be fully overwritten:
MDC.setContextMap(Optional.ofNullable(contextMap).orElse(Map.of()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding this line. We are getting a copy of the context map at line 35, so why isn't line 38 effectively a no-op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great question. Stairway spawns new threads for flight execution, so this Runnable within may be happening on a new thread without the prior context.

But this tells me that this code would be better organized if it's added to the Stairway thread pool implementation instead. I'll make that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cahrens , I reorganized this code to live in StairwayThreadPool and feel like it makes more sense now, thanks for the original comment. Would you mind taking another look?

@okotsopoulos okotsopoulos changed the title [WOR-1544] Populate MDC for flight and its steps [WOR-1544] #major Populate MDC for flight and its steps Mar 1, 2024
Because Stairway flights are submitted to the StairwayThreadPool for execution, it makes more sense to update the threadpool to wrap task submission to handle setting up and clearing flight context for the duration of the flight run on the child thread.
MdcHelper becomes MdcUtils, now a home for less involved utility methods for modifying the MDC.
Expanded tests accordingly: StairwayThreadPool did not previously have targeted unit tests.
int active = activeTasks.incrementAndGet();
logger.debug("before: " + active);
super.beforeExecute(t, r);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the docstring on this method, it's recommended to call the super at the end to allow for potential nesting.

(I also read the docstring on afterExecute, and the super call at the beginning matches the recommendation.)

@okotsopoulos okotsopoulos marked this pull request as draft March 1, 2024 21:43
@okotsopoulos okotsopoulos marked this pull request as ready for review March 1, 2024 23:06
@@ -47,7 +47,7 @@ jobs:
distribution: 'temurin'
cache: 'gradle'
- name: Run tests
run: ./gradlew test
run: ./gradlew test --scan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generates a Gradle build scan as part of test output, which makes debugging tests a lot easier. Example from the most recent unit test run:

Publishing build scan...
https://gradle.com/s/gu4jckjm4mqzq

And revisiting MdcUtilsTest with some naming changes for clarity.
Copy link

sonarcloud bot commented Mar 4, 2024

@okotsopoulos okotsopoulos merged commit 11626a4 into develop Mar 4, 2024
3 checks passed
@okotsopoulos okotsopoulos deleted the okotsopo-WOR-1544-mdc branch March 4, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants