-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In 1105 finalize command #80
base: main
Are you sure you want to change the base?
Conversation
* Rename recipient_email_address > recipient_email_addresses and update type hint to list[str] to reflect expected usage * Update unit tests to account for these changes
* Remove __getattr__ method * Add workspace, sentry_dsn, aws_region_name, dss_input_queue, and dsc_source_email properties and update calls across repo * Remove duplicate addHandler call in configure_logger method
Why these changes are being introduced: * A CLI command is need to process the submissions results that are sent to a DSS output queue How this addresses that need: * Rename stream > log_stream * Add finalize CLI command and corresponding CLI test * Add process_results and send_logs methods to Workflow class and corresponding unit tests * Add Config call in Workflow module * Remove SimpleCSV.process_deposit_results method as the class will default to Workflow.process_results * Reorder fixtures Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All config changes looking great to me, thanks for pushing on that and leading a discussion.
I did leave a couple of comments about the CLI / Workflow relationship, and the approach of capturing logs as the primary output. I can remain open to it, but would be remiss without a bit more discussion either in this PR or a call about the approach.
def process_deposit_results(self) -> list[str]: | ||
"""Process results generated by the deposit according to the workflow subclass. | ||
def process_results(self) -> dict[str, Any]: | ||
"""Process results in the workflow's output queue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we expand on this docstring a bit? Maybe an opportunity to update the run()
method as well. I think it could be helpful if the docstring touches on two things:
- what this method acheives for the
Workflow
, very high level, for humans - a tiny bit of how it accomplishes this technically (e.g. mentioning SQS queues)
For this PR, I'm looking at the base Workflow
class from kind of a 30ft view. My understanding is that:
- CLI command
deposit
--> Workflow methodrun()
- CLI command
finalize
--> Workflow methodprocess_results()
I'm not strictly opposed to this naming mismatch, but I think for someone who is not actively in the codebase (e.g. myself) I wonder why this naming disconnect. I had interpreted run()
to be the primary method for the Workflow in previous PRs, but I think this seemingly equally important method process_results()
pushes on that a bit.
I'm not sure renaming is required (but open to it), but at the very least I think the docstring could hold your hand a little more when you land here from the CLI commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree this could use more detail, I'll add that. @jonavellecuerdo and I were also discussing possibly updating run
> submit_items
and the deposit
CLI command to submit
to be more accurate. deposit
was another holdover from wiley-deposits
that I've been moving away from but hadn't updated the CLI command yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to note that at some point @ehanson8 mentioned maybe changing the CLI command to submit
, but there was some hesitancy in that DSS also uses submit
-> upload submissions to DSpace@MIT.
From my recollection, these CLI commands were taken from Wiley deposits. I'm in support of alignment in names, if anything:
- CLI command
submit
->Workflow.submit
- CLI command
process-results
->Workflow.process_results
Also agree comments re: Expanding info docstrings!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonavellecuerdo Jinx 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workflow.process_results() | ||
workflow.send_logs(ctx.obj["log_stream"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to locate this comment here, but it certainly has tendrils beyond these two lines. It may have connection to another comment about the CLI / Workflow class naming disconnect as well.
I suppose I'm still feeling a bit uneasy about setting up a StringIO
logger, capturing virtually all logs while the Workflow works, and then shipping them off via email as the primary output of this app.
A few concerns come to mind:
- it feels like it would be easy for someone not super familiar with this approach, to add a logging statement to code, that would start showing up in emails, given that logging is conventionally not directly shipped by an application
- the order of logging matters: if we ever wanted to reorganize or cleanup the email sent out, we're kind of at the mercy of when and how things are logged
I suppose this feels like it has connection to the CLI/Workflow method naming, as why wouldn't we have a Workflow method like finalize()
, that could call methods like process_results()
and send_logs()
(or maybe that gets reworked into something like report()
).
From my POV, it would arguably be easier to capture messages -- e.g. on the Workflow
instances -- as things happen, and then prepare a final "report" that gets emailed out. If this report generation were encapsulated in a method, you then have the ability to very easily modify it over time as requirements (invariably) change.
Happy to discuss more. I almost proposed it could wait, maybe be a ticket for returning to later, but it feels kind of fundamental to this last major action the Workflow takes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear your concerns and it needs to be made clear in the eventual README.md
how the logs are used, but let's huddle later today to discuss this in more detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This review counts as an initial pass! I have one minor change request in addition to the 1-2 potential change requests proposed by @ghukill (updating Config
to handle env vars without defaults and aligning names for CLI commands + Workflow
methods).
I will also wait until we have a discussion re: how to capture emailed logs!
@@ -52,6 +52,7 @@ SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring | |||
WORKSPACE=### Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform. | |||
AWS_REGION_NAME=### Default AWS region. | |||
DSS_INPUT_QUEUE=### The DSS SQS input queue to which submission messages are sent. | |||
DSC_SOURCE_EMAIL=### The email address from which SES emails are sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The email address(es) to which SES sends emails."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually refers to what is in From:
which needs to be specified for SES, not To:
which is populated by --email_recipients
Purpose and background context
Refactor clients, config, and CLI functionality in addition adding a
finalize
CLI command. The changes have been logically grouped into several commits for easier review.How can a reviewer manually see the effects of these changes?
Not possible yet
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)