Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pygfried in file format identification #176

Merged
merged 3 commits into from
Dec 29, 2021
Merged

Use pygfried in file format identification #176

merged 3 commits into from
Dec 29, 2021

Conversation

sevein
Copy link
Member

@sevein sevein commented Dec 28, 2021

This commit simplifies file format identification. It removes all file
identification related models and data entries from the fpr application.
Instead, it defaults to pygfried, a Python package that makes siegfried
available as a CPython extension.

pygfried only reports PRONOM identifiers for the time being but the plan
is to support additional registries.

What's left is fpr.fp{rule,tool,command}, that we could eventually move
out of the database into an interoperable format easier to manage externally.

Connects to #55.

@sevein sevein requested a review from replaceafill December 28, 2021 13:25
@codecov-commenter
Copy link

codecov-commenter commented Dec 28, 2021

Codecov Report

Merging #176 (f52e3f5) into main (acb374e) will increase coverage by 0.52%.
The diff coverage is 54.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #176      +/-   ##
==========================================
+ Coverage   46.07%   46.60%   +0.52%     
==========================================
  Files         107      107              
  Lines        7863     7774      -89     
  Branches     1196     1180      -16     
==========================================
  Hits         3623     3623              
+ Misses       4065     3972      -93     
- Partials      175      179       +4     
Impacted Files Coverage Δ
a3m/cli/client/__main__.py 0.00% <ø> (ø)
a3m/client/clientScripts/a3m_download_transfer.py 0.00% <0.00%> (ø)
a3m/fpr/models.py 71.62% <ø> (+3.17%) ⬆️
a3m/client/clientScripts/identify_file_format.py 54.83% <58.62%> (+54.83%) ⬆️
a3m/databaseFunctions.py 60.91% <0.00%> (+1.14%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update acb374e...f52e3f5. Read the comment docs.

This commit simplifies file format identification. It removes all file
identification related models and data entries from the fpr application.
Instead, it defaults to pygfried, a Python package that makes siegfried
available as a CPython extension.
Copy link
Collaborator

@replaceafill replaceafill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @sevein and the way you integrated siegfried is very cool.

The only thing that got my attention is that the switch from using the job print helpers to the logger stops creating the %SIPLogsDirectory%fileFormatIdentification.log file. If we start moving in that direction maybe we should update workflow.json to not have those references anymore.

@sevein
Copy link
Member Author

sevein commented Dec 29, 2021

The only thing that got my attention is that the switch from using the job print helpers to the logger stops creating the %SIPLogsDirectory%fileFormatIdentification.log file. If we start moving in that direction maybe we should update workflow.json to not have those references anymore.

Thank you, I totally forgot about that!

@sevein sevein merged commit 3fbb51a into main Dec 29, 2021
@sevein sevein deleted the dev/pygfried branch December 29, 2021 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants