Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SES and SQS client #9

Merged
merged 6 commits into from
Dec 10, 2024
Merged

Add SES and SQS client #9

merged 6 commits into from
Dec 10, 2024

Conversation

ehanson8
Copy link
Contributor

@ehanson8 ehanson8 commented Dec 6, 2024

Purpose and background context

This code is largely copied over from wiley-deposits with some minor changes to align it with our current coding practices.

How can a reviewer manually see the effects of these changes?

Not possible yet

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* SES and SQS clients are needed for sending logs to stakeholders and interacting with the DSS input and output queues

How this addresses that need:
* Add SESClient and SQSClient classes with corresponding unit tests and fixtures for the methods
* Add exceptions module
* Update dependencies

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1099
Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I did leave some comments about SQS validation, but wholly optional. I believe it would be functional and understandable as-is.

dsc/ses.py Outdated Show resolved Hide resolved
dsc/sqs.py Outdated Show resolved Hide resolved
dsc/sqs.py Outdated
):
valid = True
else:
logger.exception("Failed to parse SQS message: %s", sqs_message)
Copy link

@ghukill ghukill Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to pick on this logger.exception line, but I think it may apply more broadly to the flow here, and the logging messages from the helper methods.

Let's say that self.valid_result_message_body() cannot load the Body part of the message as JSON. Wouldn't both that method, and this method valid_sqs_message() (or validate_sqs_message() if renamed) both log the full SQS message? Something like:

Failed to parse SQS message body: <SQS message here...>
Failed to parse SQS message: <SQS message here...>

To me what feels ideal would be logging like:

"Invalid SQS message: failed to parse message body

Where the first part is coming from the validate_sqs_message() method, but the reason "failed to parse message Body as JSON" came from the helper method valid_result_message_body.

It's all pretty minor, and frankly not blocking for my approval, but feels like maybe those helper methods that are doing the validation work could just return tuples like (<success:bool>, <reason:str | None>) that this validate_sqs_message() method could use.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have led with: I do like this validation approach! It's nice they are in standalone methods.

I'd be okay with the approach as-is, there are things I like about it too. Please take all comments here as optional / conversational, but by no means requested changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did refactor this method to prevent dupe logging, let me know how it looks

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehanson8 - maybe missing it somehow, but not seeing a new commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed!

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple change requests! Let me know what you think. Just had a few thoughts on what might be good updates when looking at clients in our more recent projects. 🤓

dsc/ses.py Outdated
@@ -0,0 +1,84 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here but applies to all the modules for AWS clients: Can we move these modules to a new directory -- dsc/utilities/aws/? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

dsc/ses.py Outdated Show resolved Hide resolved
dsc/sqs.py Outdated
Comment on lines 24 to 30
class SQSClient:
"""A class to perform common SQS operations for this application."""

def __init__(self, region: str, base_url: str, queue_name: str) -> None:
self.client = client("sqs", region_name=region)
self.base_url: str = base_url
self.queue_name: str = queue_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at the SQSClient from Geo-Harvester and think it would be good to bring in the changes regarding the queue_name and queue_url arguments to the init method.

The changes are subtle but streamline the code, effectively replacing the following lines of code to use self.queue_url (instead of string concatenation):

Some fixtures and tests would need to be updated as well!

Note: The lines of code from Geo-Harvester referenced above include a @property method for accessing the client, but I prefer what we are doing in DSC: instantiating the client once during __init__ (I believe the property method approach in Geo-Harvester will instantiate a new client every time SQSClient.client/self.client is called 🤔 ).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated!

dsc/sqs.py Outdated Show resolved Hide resolved
* Shift AWS modules to utilities/aws
* Refactor SQSClient.__init__ method to use queue_url property
* Rename methods from valid_result_message, valid_sqs_message > validate_message for consistency and brevity
* Refactor validate_message method to avoid duplicate logging and add corresponding unit tests
* Update log messages to use f-strings
@coveralls
Copy link

coveralls commented Dec 10, 2024

Pull Request Test Coverage Report for Build 12264180424

Details

  • 91 of 94 (96.81%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.8%) to 96.067%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dsc/utilities/aws/ses.py 24 25 96.0%
dsc/utilities/aws/sqs.py 65 67 97.01%
Totals Coverage Status
Change from base Build 12203655192: 0.8%
Covered Lines: 171
Relevant Lines: 178

💛 - Coveralls

@ehanson8
Copy link
Contributor Author

@ghukill @jonavellecuerdo Pushed a new commit, let me know how it looks!

return response

def validate_message(self, sqs_message: MessageTypeDef) -> bool:
Copy link

@ghukill ghukill Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that I submitted an approval already, and I do think this current approach will successfully validate messages, but taking another pass at the updates I do wonder if there is an opportunity to streamline some of this validation even more. I would imagine that much of this application will be about validation, so some patterns to follow throughout could be helpful to establish early.

I see that all this validation is under the umbrella of this call to self.validate_message() in self.process_result_message():

if not self.validate_message(sqs_message):
    raise InvalidSQSMessageError

Where the ultimate goal is to a) log what happend, and b) raise InvalidSQSMessageError from this method.

What if each validation helper method was responsible for directly raising InvalidSQSMessageError if they found something wrong? and self.validate_message() is just an orchestrator of calling these methods? Something along the lines of:

def self.validate_message() -> None:
    # call all validators here
    # if all pass, return None, and all is well!
    # if any raise an exception, it will bubble up from here and reach self.process_result_message()
    #   where this was originally called from

def validate_message_attributes() -> None:
   # each validator returns None
   # do checks, raise InvalidSQSMessageError with a meaningful message if something wrong

# continue to define validators as needed

I think this leans into the EAFP (Easier to Ask for Forgiveness than Permission) python pattern. If we call self.validate_message() and nothing happens, great. But if any of those validator methods raise that InvalidSQSMessageError exception, it will:

  1. get raised from that lowest validator method
  2. bubble up through validate_message()
  3. bubble up from process_results_message() here just as it does now

This pattern is extensible, becuase you can just keep calling validator methods, knowing they are responsible for raising that exception.

To reiterate: with only two validations here, I think the current approach works as-is. But I have this hunch that this DSC app will be validation heavy, and a validation pattern that is applied somewhat globally could be helpful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @ghukill 's proposal to have validate_message simply call validation methods that are responsible for raising InvalidSQSMessageError with meaningful messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is definitely a better approach, I greatly appreciate the suggestion and I'll rework!

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making these additional changes! Approved but waiting on your thoughts re: implementing @ghukill 's proposal for validation methods approach.

response = self.client.delete_message(
QueueUrl=f"{self.base_url}{self.queue_name}",
QueueUrl=self.queue_url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return response

def validate_message(self, sqs_message: MessageTypeDef) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @ghukill 's proposal to have validate_message simply call validation methods that are responsible for raising InvalidSQSMessageError with meaningful messages.

* Refactor SQSClient
validate methods for greater efficiency and update corresponding unit tests
@ehanson8
Copy link
Contributor Author

@ghukill @jonavellecuerdo Updated the validation methods!

Args:
sqs_message: An SQS result message to be processed.
"""
self.validate_message(sqs_message)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Comment on lines 184 to 188
or not any(
field
for field in sqs_message["MessageAttributes"]
if "PackageID" in field
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admint, this one is still tripping me up. Are we looking for "PackageID" as a key in the sqs_message["MessageAttributes"] dictionary?

If so, could we rewrite?

or "PackageID" not in sqs_message["MessageAttributes"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are looking at the validation method even closer, I am curious about this as well. Debugging the test, it seems the an sqs_message looks like:

{
    "MessageId": "47179327-6a75-467d-b431-d7bfd802b15e",
    "ReceiptHandle": "zqaejnmrlfkzlxsxexmzsnyjsiugkotjdouweeephwuigifrlsrwxshfqmguyjttklxxrogfiohgwezhqpywvrbgbueebxlmuenrwtgtgmhbaugtywthkppbixxbjowmkqcwiqapxjpirgeeuowpgdkrdskhawgimaaynsbekemlfxoapaskuewaa",
    "MD5OfBody": "2affecad36e208b6be933fce09b7a89e",
    "Body": '{"ResultType": "success", "ItemHandle": "1721.1/131022", "lastModified": "Thu Sep 09 17:56:39 UTC 2021", "Bitstreams": [{"BitstreamName": "10.1002-term.3131.pdf", "BitstreamUUID": "a1b2c3d4e5", "BitstreamChecksum": {"value": "a4e0f4930dfaff904fa3c6c85b0b8ecc", "checkSumAlgorithm": "MD5"}}]}',
    "MD5OfMessageAttributes": "5edd88184e055125bdab8e1b116ecf5e",
    "MessageAttributes": {
        "PackageID": {"StringValue": "10.1002/term.3131", "DataType": "String"},
        "SubmissionSource": {"StringValue": "Submission system", "DataType": "String"},
    },
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know how I came up with that check and how it survived this long without refactoring but yes or "PackageID" not in sqs_message["MessageAttributes"] is much simpler

Copy link
Contributor

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two minor change requests, including @ghukill 's suggestion!

Comment on lines 170 to 171
message = f"Failed to retrieve 'ReceiptHandle' from message: {sqs_message}"
raise InvalidSQSMessageError(message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not doing what I said I would in this Slack thread, but I think we previously discussed: (a) adding Ruff rule EM102 to ignore list and (b) allow passing of simple error messages to exceptions!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

Comment on lines 184 to 188
or not any(
field
for field in sqs_message["MessageAttributes"]
if "PackageID" in field
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are looking at the validation method even closer, I am curious about this as well. Debugging the test, it seems the an sqs_message looks like:

{
    "MessageId": "47179327-6a75-467d-b431-d7bfd802b15e",
    "ReceiptHandle": "zqaejnmrlfkzlxsxexmzsnyjsiugkotjdouweeephwuigifrlsrwxshfqmguyjttklxxrogfiohgwezhqpywvrbgbueebxlmuenrwtgtgmhbaugtywthkppbixxbjowmkqcwiqapxjpirgeeuowpgdkrdskhawgimaaynsbekemlfxoapaskuewaa",
    "MD5OfBody": "2affecad36e208b6be933fce09b7a89e",
    "Body": '{"ResultType": "success", "ItemHandle": "1721.1/131022", "lastModified": "Thu Sep 09 17:56:39 UTC 2021", "Bitstreams": [{"BitstreamName": "10.1002-term.3131.pdf", "BitstreamUUID": "a1b2c3d4e5", "BitstreamChecksum": {"value": "a4e0f4930dfaff904fa3c6c85b0b8ecc", "checkSumAlgorithm": "MD5"}}]}',
    "MD5OfMessageAttributes": "5edd88184e055125bdab8e1b116ecf5e",
    "MessageAttributes": {
        "PackageID": {"StringValue": "10.1002/term.3131", "DataType": "String"},
        "SubmissionSource": {"StringValue": "Submission system", "DataType": "String"},
    },
}

"""
if (
"MessageAttributes" not in sqs_message
or "PackageID" not in sqs_message["MessageAttributes"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ehanson8 ehanson8 merged commit 656bcbb into main Dec 10, 2024
2 checks passed
@ehanson8 ehanson8 deleted the IN-1099-ses-and-sqs-clients branch December 10, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants