Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IN 1059 - Maintenance 2024-08 #42

Merged
merged 12 commits into from
Sep 6, 2024
Merged

IN 1059 - Maintenance 2024-08 #42

merged 12 commits into from
Sep 6, 2024

Conversation

ghukill
Copy link
Contributor

@ghukill ghukill commented Sep 3, 2024

Purpose and background context

2024-08 maintenance for this project.

This is the first touch of this project since new CLI and lambda project conventions were put in place as part of regular maintenance.

The major updates for this work include:

  • bumping python version from 3.9 to 3.12
  • adding type hints and enforce mypy linting
  • moving to ruffas the primary umbrella linting library

This maintenance work is happening in parallel with some research into DSS error handling as outlined in IN-1035. Felt this is worth noting, as the type hinting work performed in this PR reveals how a couple of edge cases might have slipped by the current error handling and reporting. As such, proposing to move forward first with this PR that introduces type hints, and then complete IN-1035 with some of that added information and context in code.

One this PR is reviewed and merged, we can move to more manual integration testing in Stage. That said, some local testing is outlined below to confirm that basic functionality is maintained.

How can a reviewer manually see the effects of these changes?

With a little bit of work, it's possible to test locally that the messages are read from, and then removed, from the input queue after processing. This utilizes the env var SKIP_PROCESSING=true to avoid actually submitting an item to DSpace; the item is merely processed and removed from the input queue.

Setup .env file:

WORKSPACE='dev'
SKIP_PROCESSING=true
AWS_REGION_NAME=us-east-1
SQS_ENDPOINT_URL='http://localhost:5000'
INPUT_QUEUE=dss-input-dev-test
OUTPUT_QUEUES=dss-etd-output-dev-test
LOG_LEVEL=DEBUG

Run Moto server:

docker run --rm -p 5000:5000 --name moto motoserver/moto:latest

The rest should be performed from a new terminal, allowing moto SQS server to continue running in the other...

Setup AWS credentials in terminal

  • not used because all local... but the boto3 library needs them active to continue

Create queues

pipenv run submitter create-queue dss-input-dev-test
pipenv run submitter create-queue dss-etd-output-dev-test

Load sample data to input queue:

pipenv run submitter load-sample-input-data \
-i dss-input-dev-test \
-o dss-etd-output-dev-test \
-f tests/fixtures/completely-fake-data.json

Check input queue has message (should return 10 messages, can use space to paginate):

aws sqs receive-message \
--queue-url http://localhost:5000/123456789012/dss-input-dev-test \
--endpoint-url http://localhost:5000 \
--visibility-timeout 0 \
--wait-time-seconds 0 \
--max-number-of-messages 10

Run submission service:

pipenv run submitter start --wait 3

Confirm all messages gone from input queue:

aws sqs receive-message \
--queue-url http://localhost:5000/123456789012/dss-input-dev-test \
--endpoint-url http://localhost:5000 \
--visibility-timeout 0 \
--wait-time-seconds 0 \
--max-number-of-messages 10

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO: not believed that any application logic will be different

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

@ghukill ghukill marked this pull request as ready for review September 3, 2024 16:58
@ghukill ghukill requested review from ehanson8 and jonavellecuerdo and removed request for ehanson8 and jonavellecuerdo September 3, 2024 16:58
Why these changes are being introduced:

This repository has not been updated with new
project structure and conventions.  This repository also
had an old python version.

How this addresses that need:
* Sets python version to 3.12
* Updates Makefile, pyproject.toml, and dependencies to
meet new structure
* Minimal updates to tests to pass testing

Side effects of this change:
* Python version bumped to 3.12

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1059
Why these changes are being introduced:

This repository pre-dated a convention to add type hints and apply
mypy linting.

How this addresses that need:
* First pass adds type hints were easy
* Focus on function argument and return types
* This does NOT address all typing linting errors, preferring to untangle
some custom exception linting errors in a future commit

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1059
Why these changes are being introduced:

A submission.Submission instance is constructed from its
from_message method.  This results in two distinct types of
a Submission object:

1. input queue message was valid, so we hydrate most all
properties

2. input queue message was not valid, and only a few properties
are set, resulting in the message bubbling up during processing

This results in properties that may be str or None (example),
which leads to ambiguity in later usage of the Submission
object.  These type hints do not change any functionality,
but they reveal that downstream methods that operate over
these Submission objects are making assumptions about which
"version" (valid or invalid) of a Submission object they are
using.

How this addresses that need:
* Adds type hints through Submission methods
* Adds type hints to custom exceptions that business logic
around Submission objects may throw

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1059
Side effects of this change:
* When generating a result_error_message, datetime.now() has been
updated to set the timezone as UTC.  It is believed this is the default
behavior in AWS for a normal now(), but it WILL change the datetime for
local testing to be UTC time.

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1059
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you for taking on an update of this magnitude! A few comments

.pre-commit-config.yaml Outdated Show resolved Hide resolved
submitter/config.py Show resolved Hide resolved
submitter/submission.py Show resolved Hide resolved
submitter/submission.py Show resolved Hide resolved
submitter/submission.py Show resolved Hide resolved
@ghukill
Copy link
Contributor Author

ghukill commented Sep 4, 2024

@ehanson8 - updated the pre-commit YAML python version and responded to a couple questions. Review re-requested!

Copy link

@jonavellecuerdo jonavellecuerdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thank you for your work on this repo! Had one non-blocking question for ya'.

@@ -61,7 +67,7 @@ def from_message(cls, message):
except KeyError as e:
raise errors.SubmitMessageMissingAttributeError(
message.message_id, e.args[0]
)
) from e

try:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non-blocking] Commenting on this line but question is about the exception: Curious what would result in a TypeError? I see that the json.loads(message.body) can result in a JSONDecodeError and attempting to index the destination, collection_handle, etc. can result in KeyErrors. 🤔

Copy link
Contributor Author

@ghukill ghukill Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. My assumption is that it's an edge case handling if the decoded JSON does not result in a dictionary.

Example:

import json
body = json.loads("null")  # technically valid JSON; could be a single integer, string, etc., anything but an object
destination = body['SubmissionSystem']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 destination = body['SubmissionSystem']

TypeError: 'NoneType' object is not subscriptable

I think it's the first time we attempt to parse the message body, so at this point could really be anything.

@ghukill ghukill merged commit b708564 into main Sep 6, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants