Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintenance 09 2024 #39

Merged
merged 4 commits into from
Sep 20, 2024
Merged

Maintenance 09 2024 #39

merged 4 commits into from
Sep 20, 2024

Conversation

ehanson8
Copy link
Contributor

Purpose and background context

Updates app according to our maintenance week documentation.

How can a reviewer manually see the effects of these changes?

Run make test and make lint to confirm they still pass

Includes new or updated dependencies?

YES

Changes expectations for external applications?

NO

Developer

  • All new ENV is documented in README
  • All new ENV has been added to staging and production environments
  • All related Jira tickets are linked in commit message(s)
  • Stakeholder approval has been confirmed (or is not needed)

Code Reviewer(s)

  • The commit message is clear and follows our guidelines (not just this PR message)
  • There are appropriate tests covering any new functionality
  • The provided documentation is sufficient for understanding any new functionality introduced
  • Any manual tests have been performed or provided examples verified
  • New dependencies are appropriate or there were no changes

* Update to 3.12 in .python-version, Dockerfile, and Pipfile
* Add help command to Makefile
* Update and reorder dependencies in Pipfile
* Update pyproject.toml
Copy link
Collaborator

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before diving into the rest of the PR (which looks great at a glance, not anticipating any requests), did have one.

While attempting to run a local harvest test via:

make dist-local
make run-harvest-local

It was discovered that the testing configurations here are pointing at a version of the website no longer available.

A more stable version was shared, which would allow updating that YAML to the following:

generateCDX: true
generateWACZ: true
text: to-pages
# prevent PAGES from getting crawled; scoping
exclude:
  - ".*lib.mit.edu/search/.*"
  - ".*mit.primo.exlibrisgroup.com/.*"
# prevent RESOURCES / ASSETS from getting retrieved; URL requests
blockRules:
  - ".*googlevideo.com.*"
  - ".*cdn.libraries.mit.edu/media/.*"
  - "\\.(jpg|png)$"
depth: 1
maxPageLimit: 20
timeout: 30
scopeType: "domain"
seeds:
  - url: https://www-test.libraries.mit.edu/sitemap.xml
    sitemap: https://www-test.libraries.mit.edu/sitemap.xml

Can we make this change? Thanks!

@ehanson8
Copy link
Contributor Author

@ghukill Good catch, made the change!

Copy link
Collaborator

@ghukill ghukill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved! Able to run a local crawl/harvest, all looks good.

@ehanson8 ehanson8 merged commit d171f9b into main Sep 20, 2024
5 checks passed
@ehanson8 ehanson8 deleted the maintenance-09-2024 branch September 20, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants