Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DCJ-507] Stairway with work queue enabled should be context-aware #152

Merged
merged 5 commits into from
Jul 15, 2024

Conversation

okotsopoulos
Copy link
Contributor

@okotsopoulos okotsopoulos commented Jul 11, 2024

Jira ticket: https://broadworkbench.atlassian.net/browse/DCJ-507

Addresses

Context-aware Stairway previously assumed that the calling thread would have mapped diagnostic context (MDC) to persist to the child threads spawned for flight execution.

This was not the case when Stairway is configured with a work queue enabled (e.g. GCP Pub-Sub). Here, flight information was written to the pub-sub topic without the calling thread's context, so when a message was processed and deserialized any logs emitted from that flight would be missing context set by the original calling thread (e.g. request ID).

Where we've felt this impact in TDR:

  • Child flights submitted within the execution of a parent flight are missing the calling thread's context in its logs.
  • While this is an uncommon pattern for flight execution, it is used when facilitating combined data ingests to datasets: the parent flight ingests tabular data, and child flights are spawned for each file being ingested.
  • Much of our attention has been on debugging combined data ingests to Azure datasets: the inability to filter for all connected parent and child flight logs in GCP log console by request ID (jsonPayload.requestId) is slowing down investigative progress.

Summary of changes

  • Added new helper method MdcUtils.callWithContext to call a callable and return its result with MDC's context map temporarily overwritten.
  • Expanded QueueMessageReady to store calling thread context on construction.
  • QueueMessageReady.process() sets the MDC using its stored calling thread context, which then makes the flight context-aware.

Testing Strategy

  • Added unit tests for MdcUtils.callWithContext.
  • Expanded unit tests for QueueMessageReady with special attention to MDC behavior and propagation in serialization, deserialization, and processing.

Note: Sonarcloud coverage stats are incorrectly reporting 0% coverage. They look to have never worked. I do believe I've covered all of my changes in unit tests. ETA: thanks @pshapiro4broad for fixing the Sonar scans, I pulled the fix into this branch and we are now seeing coverage stats.

Context-aware Stairway previously assumed that the calling thread would have mapped diagnostic context (MDC) to persist to the child threads spawned for flight execution.

This was not the case when Stairway is configured with a work queue enabled (e.g. GCP Pub-Sub).  Here, flight information was written to the pub-sub topic without the calling thread's context, so when a message was processed and deserialized any logs emitted from that flight would be missing context set by the original calling thread (e.g. request ID).

- Expanded QueueMessageReady to store calling thread context on construction
- QueueMessageReady.process sets the MDC using its stored calling thread context, which then makes the flight context-aware
- Expanded unit tests
@okotsopoulos okotsopoulos marked this pull request as ready for review July 11, 2024 20:28
@okotsopoulos okotsopoulos requested review from a team, rushtong and fboulnois and removed request for a team July 11, 2024 20:30
Copy link

@rushtong rushtong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me 👍🏽

Copy link
Member

@pshapiro4broad pshapiro4broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, although I'd prefer making fewer APIs public for tests if possible


@BeforeEach
void beforeEach() {
MDC.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to use a static mock instead of the actual MDC API? We don't use mocks yet for MDC, so it may be too big a change for this PR, but it's something to consider if we'd like to isolate our tests from the MDC implementation, and might make it easier to check if our code is doing what we expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah this feels like too big a change for this PR. But you make good points about the benefits of test isolation.

MdcUtils:
- Added public utility method callWithContext for running and returning a callable with MDC's context map temporarily overwritten
- Remove public modifier on overwriteContext in favor of the above

QueueMessageReady:
- Use MdcUtils.callWithContext to simplify process definition
- Revert process return strategy to return booleans directly rather than setting a boolean to return later
- Remove unnecessary callingThreadContext setter (left existing setters untouched even though they can likely be removed: OOS)
- Left public callingThreadContext getter unchanged: it is required to be public for serde operations

Added unit test coverage for all changes.
Copy link
Collaborator

@fboulnois fboulnois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍

Copy link

sonarcloud bot commented Jul 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants