Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create SCCS workflow #77

Merged
merged 3 commits into from
Jan 16, 2025
Merged

Create SCCS workflow #77

merged 3 commits into from
Jan 16, 2025

Conversation

jonavellecuerdo
Copy link
Contributor

@jonavellecuerdo jonavellecuerdo commented Jan 15, 2025

Purpose and background context

This PR creates the SCCS DSC workflow. Implementation of the SCCS workflow was amazingly simple as it inherits from the SimpleCSV base workflow and only required assigning values for the workflow_name and metadata_mapping_path class attributes. This also introduces a new folder in dsc/workflows to hold metadata mapping JSON files.

This PR also includes additional changes for cleanup:

  • Renaming DemoWorkflow -> Demo
  • Changing email_recipients to list[str]
    Note: @ehanson8 , mypy may have been throwing an error previously if it was assigned an empty list ([]) (a "mutable default") in the __init__ signature. 🤔
  • Cleaning up TestWorkflow and TestSimpleCSV fixtures
    • Pass values for email_recipients and output_queue when creating instances (i.e., base_workflow_instance and simple_csv_workflow_instance fixtures)

How can a reviewer manually see the effects of these changes?

Run make lint and make test and verify all unit tests are passing.

@ehanson8 and I synced up to discuss testing and whether new unit tests were required for the creation of the SCCS workflow. In summary:

  • A test_<workflow>.py module should only be created to test functionality of overridden methods.
  • Metadata mapping is covered by these unit tests. Regarding metadata mapping, these tests confirm:
    • Source field names -> Dublin Core field mappings is successful
    • Presence of delimiters results in entries created for each value in a delimiter-separated string
    • An error is raised when a field with "required": true is missing.

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

Why these changes are being introduced:
* Support deposits requested by Scholarly Communications and Collections Strategy (SCCS).

How this addresses that need:
* Create SCCS workflow and metadata mapping JSON file
* Move DemoWorkflow metadata mapping JSON file
* Change Workflow.email_recipients to list[str]
* Clean up Workflow and SimpleCSV fixtures

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1098
@jonavellecuerdo jonavellecuerdo changed the title In 1098 sccs workflow Create SCCS workflow Jan 15, 2025
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review January 15, 2025 17:44
@jonavellecuerdo jonavellecuerdo self-assigned this Jan 15, 2025
Copy link
Contributor

@ehanson8 ehanson8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and the organizational changes are excellent!

@@ -35,7 +35,7 @@ def __init__(
self,
collection_handle: str,
batch_id: str,
email_recipients: tuple[str, ...],
email_recipients: list[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the linting is passing then I guess this is all good!

Copy link

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of comments and questions, but approved regardless of answers!

from dsc.workflows import SimpleCSV


class SCCS(SimpleCSV):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When looking at a class this minimal, a couple of thoughts come to mind:

  1. Nice work on class design and inheritance! Clearly much is shared between workflows.
  2. What functionality / opinionation, if anything, could this workflow-specific class still provide?

For # 2, the first thing that comes to mind is validation or hooks of some kind. I don't think it's necessary now, but wondering if there might be room in the future for some kind of registration of validation hooks, that could be used to validate items created? reconciliation logic? maybe others? In theory, these could be called -- if defined -- during the normal flow of the base Workflows.

Sketch:

class MyWorkflow(SimpleCSV):
    
    workflow_name = "Mine!"
    metadata_mapping_path = "path/to/mapping.json"
    
    # defined as class attributes, lists pointing to methods
    reconciliation_hooks = [] # I don't have any reconciliation hooks...
    item_creation_hooks = [
        "confirm_title_elements" # reference to method defined on class
    ]
    
    @classmethod
    def confirm_title_elements(cls):
        # I will check things yielded by Workflow.item_submissions_iter()...
        pass

With a little bit of wiring, this might provide a way for a Workflow class to have some additional validation (or even functional) logic applied to the base flows.

This pattern could also probably be achieved through decorators, which could be even cleaner.

Sketch:

class MyWorkflow(SimpleCSV):
    
    workflow_name = "Mine!"
    metadata_mapping_path = "path/to/mapping.json"
    
    @item_creation_hook
    def confirm_title_elements(cls):
        # I will check things yielded by Workflow.item_submissions_iter()...
        pass
   
    # maybe others like @reconciliation_hook

dsc/workflows/metadata_mapping/sccs.json Outdated Show resolved Hide resolved
* Make 'delimiter' and 'language' optional configs
* Omit null and empty string configs from metadata mapping JSON files
@coveralls
Copy link

Pull Request Test Coverage Report for Build 12797479564

Details

  • 11 of 11 (100.0%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.03%) to 97.176%

Totals Coverage Status
Change from base Build 12774012547: 0.03%
Covered Lines: 413
Relevant Lines: 425

💛 - Coveralls

@jonavellecuerdo jonavellecuerdo merged commit b510322 into main Jan 16, 2025
2 checks passed
@jonavellecuerdo jonavellecuerdo deleted the IN-1098-sccs-workflow branch January 16, 2025 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants