-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SES and SQS client #9
Conversation
Why these changes are being introduced: * SES and SQS clients are needed for sending logs to stakeholders and interacting with the DSS input and output queues How this addresses that need: * Add SESClient and SQSClient classes with corresponding unit tests and fixtures for the methods * Add exceptions module * Update dependencies Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1099
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I did leave some comments about SQS validation, but wholly optional. I believe it would be functional and understandable as-is.
dsc/sqs.py
Outdated
): | ||
valid = True | ||
else: | ||
logger.exception("Failed to parse SQS message: %s", sqs_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to pick on this logger.exception
line, but I think it may apply more broadly to the flow here, and the logging messages from the helper methods.
Let's say that self.valid_result_message_body()
cannot load the Body
part of the message as JSON. Wouldn't both that method, and this method valid_sqs_message()
(or validate_sqs_message()
if renamed) both log the full SQS message? Something like:
Failed to parse SQS message body: <SQS message here...>
Failed to parse SQS message: <SQS message here...>
To me what feels ideal would be logging like:
"Invalid SQS message: failed to parse message body
Where the first part is coming from the validate_sqs_message()
method, but the reason "failed to parse message Body as JSON" came from the helper method valid_result_message_body
.
It's all pretty minor, and frankly not blocking for my approval, but feels like maybe those helper methods that are doing the validation work could just return tuples like (<success:bool>, <reason:str | None>)
that this validate_sqs_message()
method could use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have led with: I do like this validation approach! It's nice they are in standalone methods.
I'd be okay with the approach as-is, there are things I like about it too. Please take all comments here as optional / conversational, but by no means requested changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did refactor this method to prevent dupe logging, let me know how it looks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ehanson8 - maybe missing it somehow, but not seeing a new commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple change requests! Let me know what you think. Just had a few thoughts on what might be good updates when looking at clients in our more recent projects. 🤓
dsc/ses.py
Outdated
@@ -0,0 +1,84 @@ | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here but applies to all the modules for AWS clients: Can we move these modules to a new directory -- dsc/utilities/aws/
? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
dsc/sqs.py
Outdated
class SQSClient: | ||
"""A class to perform common SQS operations for this application.""" | ||
|
||
def __init__(self, region: str, base_url: str, queue_name: str) -> None: | ||
self.client = client("sqs", region_name=region) | ||
self.base_url: str = base_url | ||
self.queue_name: str = queue_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the SQSClient
from Geo-Harvester and think it would be good to bring in the changes regarding the queue_name
and queue_url
arguments to the init method.
The changes are subtle but streamline the code, effectively replacing the following lines of code to use self.queue_url
(instead of string concatenation):
Some fixtures and tests would need to be updated as well!
Note: The lines of code from Geo-Harvester referenced above include a @property
method for accessing the client, but I prefer what we are doing in DSC: instantiating the client once during __init__
(I believe the property
method approach in Geo-Harvester will instantiate a new client every time SQSClient.client
/self.client
is called 🤔 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, updated!
* Shift AWS modules to utilities/aws * Refactor SQSClient.__init__ method to use queue_url property * Rename methods from valid_result_message, valid_sqs_message > validate_message for consistency and brevity * Refactor validate_message method to avoid duplicate logging and add corresponding unit tests * Update log messages to use f-strings
Pull Request Test Coverage Report for Build 12264180424Details
💛 - Coveralls |
@ghukill @jonavellecuerdo Pushed a new commit, let me know how it looks! |
dsc/utilities/aws/sqs.py
Outdated
return response | ||
|
||
def validate_message(self, sqs_message: MessageTypeDef) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that I submitted an approval already, and I do think this current approach will successfully validate messages, but taking another pass at the updates I do wonder if there is an opportunity to streamline some of this validation even more. I would imagine that much of this application will be about validation, so some patterns to follow throughout could be helpful to establish early.
I see that all this validation is under the umbrella of this call to self.validate_message() in self.process_result_message()
:
if not self.validate_message(sqs_message):
raise InvalidSQSMessageError
Where the ultimate goal is to a) log what happend, and b) raise InvalidSQSMessageError
from this method.
What if each validation helper method was responsible for directly raising InvalidSQSMessageError
if they found something wrong? and self.validate_message()
is just an orchestrator of calling these methods? Something along the lines of:
def self.validate_message() -> None:
# call all validators here
# if all pass, return None, and all is well!
# if any raise an exception, it will bubble up from here and reach self.process_result_message()
# where this was originally called from
def validate_message_attributes() -> None:
# each validator returns None
# do checks, raise InvalidSQSMessageError with a meaningful message if something wrong
# continue to define validators as needed
I think this leans into the EAFP (Easier to Ask for Forgiveness than Permission) python pattern. If we call self.validate_message()
and nothing happens, great. But if any of those validator methods raise that InvalidSQSMessageError
exception, it will:
- get raised from that lowest validator method
- bubble up through
validate_message()
- bubble up from
process_results_message()
here just as it does now
This pattern is extensible, becuase you can just keep calling validator methods, knowing they are responsible for raising that exception.
To reiterate: with only two validations here, I think the current approach works as-is. But I have this hunch that this DSC app will be validation heavy, and a validation pattern that is applied somewhat globally could be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @ghukill 's proposal to have validate_message
simply call validation methods that are responsible for raising InvalidSQSMessageError
with meaningful messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is definitely a better approach, I greatly appreciate the suggestion and I'll rework!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making these additional changes! Approved but waiting on your thoughts re: implementing @ghukill 's proposal for validation methods approach.
response = self.client.delete_message( | ||
QueueUrl=f"{self.base_url}{self.queue_name}", | ||
QueueUrl=self.queue_url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✨
dsc/utilities/aws/sqs.py
Outdated
return response | ||
|
||
def validate_message(self, sqs_message: MessageTypeDef) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @ghukill 's proposal to have validate_message
simply call validation methods that are responsible for raising InvalidSQSMessageError
with meaningful messages.
* Refactor SQSClient validate methods for greater efficiency and update corresponding unit tests
@ghukill @jonavellecuerdo Updated the validation methods! |
Args: | ||
sqs_message: An SQS result message to be processed. | ||
""" | ||
self.validate_message(sqs_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
dsc/utilities/aws/sqs.py
Outdated
or not any( | ||
field | ||
for field in sqs_message["MessageAttributes"] | ||
if "PackageID" in field | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admint, this one is still tripping me up. Are we looking for "PackageID"
as a key in the sqs_message["MessageAttributes"]
dictionary?
If so, could we rewrite?
or "PackageID" not in sqs_message["MessageAttributes"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are looking at the validation method even closer, I am curious about this as well. Debugging the test, it seems the an sqs_message
looks like:
{
"MessageId": "47179327-6a75-467d-b431-d7bfd802b15e",
"ReceiptHandle": "zqaejnmrlfkzlxsxexmzsnyjsiugkotjdouweeephwuigifrlsrwxshfqmguyjttklxxrogfiohgwezhqpywvrbgbueebxlmuenrwtgtgmhbaugtywthkppbixxbjowmkqcwiqapxjpirgeeuowpgdkrdskhawgimaaynsbekemlfxoapaskuewaa",
"MD5OfBody": "2affecad36e208b6be933fce09b7a89e",
"Body": '{"ResultType": "success", "ItemHandle": "1721.1/131022", "lastModified": "Thu Sep 09 17:56:39 UTC 2021", "Bitstreams": [{"BitstreamName": "10.1002-term.3131.pdf", "BitstreamUUID": "a1b2c3d4e5", "BitstreamChecksum": {"value": "a4e0f4930dfaff904fa3c6c85b0b8ecc", "checkSumAlgorithm": "MD5"}}]}',
"MD5OfMessageAttributes": "5edd88184e055125bdab8e1b116ecf5e",
"MessageAttributes": {
"PackageID": {"StringValue": "10.1002/term.3131", "DataType": "String"},
"SubmissionSource": {"StringValue": "Submission system", "DataType": "String"},
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know how I came up with that check and how it survived this long without refactoring but yes or "PackageID" not in sqs_message["MessageAttributes"]
is much simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two minor change requests, including @ghukill 's suggestion!
dsc/utilities/aws/sqs.py
Outdated
message = f"Failed to retrieve 'ReceiptHandle' from message: {sqs_message}" | ||
raise InvalidSQSMessageError(message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not doing what I said I would in this Slack thread, but I think we previously discussed: (a) adding Ruff rule EM102 to ignore list and (b) allow passing of simple error messages to exceptions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
dsc/utilities/aws/sqs.py
Outdated
or not any( | ||
field | ||
for field in sqs_message["MessageAttributes"] | ||
if "PackageID" in field | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are looking at the validation method even closer, I am curious about this as well. Debugging the test, it seems the an sqs_message
looks like:
{
"MessageId": "47179327-6a75-467d-b431-d7bfd802b15e",
"ReceiptHandle": "zqaejnmrlfkzlxsxexmzsnyjsiugkotjdouweeephwuigifrlsrwxshfqmguyjttklxxrogfiohgwezhqpywvrbgbueebxlmuenrwtgtgmhbaugtywthkppbixxbjowmkqcwiqapxjpirgeeuowpgdkrdskhawgimaaynsbekemlfxoapaskuewaa",
"MD5OfBody": "2affecad36e208b6be933fce09b7a89e",
"Body": '{"ResultType": "success", "ItemHandle": "1721.1/131022", "lastModified": "Thu Sep 09 17:56:39 UTC 2021", "Bitstreams": [{"BitstreamName": "10.1002-term.3131.pdf", "BitstreamUUID": "a1b2c3d4e5", "BitstreamChecksum": {"value": "a4e0f4930dfaff904fa3c6c85b0b8ecc", "checkSumAlgorithm": "MD5"}}]}',
"MD5OfMessageAttributes": "5edd88184e055125bdab8e1b116ecf5e",
"MessageAttributes": {
"PackageID": {"StringValue": "10.1002/term.3131", "DataType": "String"},
"SubmissionSource": {"StringValue": "Submission system", "DataType": "String"},
},
}
""" | ||
if ( | ||
"MessageAttributes" not in sqs_message | ||
or "PackageID" not in sqs_message["MessageAttributes"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Purpose and background context
This code is largely copied over from
wiley-deposits
with some minor changes to align it with our current coding practices.How can a reviewer manually see the effects of these changes?
Not possible yet
Includes new or updated dependencies?
YES
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)