-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generate File
metadata fields
#198
Comments
On contentSize I agree, it's just a system call to get a number. However, generating the sha256 of a file can take quite some time, especially if the file is large. Imagine if you have to do the same for 10000 files... This could lead to unwanted overheads when creating an RO-Crate. |
Maybe a boolean option to calculate the |
Even a system call can introduce a consistent overhead, so we'd need boolean options for both properties. However, the size = ...
checksum = ...
crate.add_file(source, dest, properties={
"contentSize": size,
"sha256": checksum
}) It's better for the library to stay lean and leave optional stuff like this to client code. In the case of remote files, note that |
So that's a no to adding these arguments? The annoying thing for end users is that we have to do the size and checksum calculation for every file, and there are likely to be many of these. So without a built-in argument, I suspect many of us will end up implementing our own wrapper functions to do this. |
Well, those properties are not required by the RO-Crate spec at any level, so some end users might want to add them while others might not. I'd rather not add the burden of more code to maintain to the library for something that's optional. I'll leave this open for others to chime in. |
Other things to consider:
|
I think having flags to turning these fields (and others, width/height/etc.) would be the way to go. On having ro-crate-py doing it, or staying lean and asking clients to implement this, maybe there could be other options too. Like having plugins in ro-crate-py, like ro-crate-py-fileutils or so. When installed, then that brings code that tries to populate file information, mime type, whatnot. This way ro-crate-py focuses only on RO-Crate and Python, and anything more specific but that helps users/implementation-devs would go to these plug-ins, and implementations can decide to use it or not. |
Discussed with @stain last Thursday at the Workflow Run RO-Crate meeting. We decided to add
Regarding the recording of a checksum such as sha256, we observed that:
So this is better left to user-level code. |
A fair approach! Thanks for summarizing it here, @simleo ! |
Implemented in #201 |
If you
crate.add_file()
a local file, I think it makes sense to populate:contentSize
sha256
This seems to be done already for remote files, so it makes sense to do the same for local ones.
Some other fields like
height
,width
andduration
could be automatically determined for images and videos respectively, but this is a much harder and less essential.Happy to help with this!
The text was updated successfully, but these errors were encountered: