Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file entitites overwritten without warnig #185

Closed
mashehu opened this issue Aug 5, 2024 · 5 comments · Fixed by #194
Closed

file entitites overwritten without warnig #185

mashehu opened this issue Aug 5, 2024 · 5 comments · Fixed by #194
Labels
documentation Improvements or additions to documentation

Comments

@mashehu
Copy link
Contributor

mashehu commented Aug 5, 2024

Hi,

I am currently working on adding RO-crates to the nf-core CLI tools: nf-core/tools#2680

nf-core pipelines usually have a root main.nf file, which should be a ComputationalWorkflow entity, but they also have modules with their own nested main.nf e.g. modules/local/cat_additional_fasta/main.nf. Trying to add all the different nested files using crate.add_file(fn, properties={"programmingLanguage": {"@id": "#nextflow"}}) where fn is the file path, e.g., modules/local/cat_additional_fasta/main.nf results in only one main.nf entry. How do I get the full path as the entry id to avoid overwriting duplicates?

Additionally, a warning that an @id already exists before overwritting it, would be helpful.

@elichad
Copy link
Contributor

elichad commented Aug 9, 2024

@simleo is this a bug? I'm not familiar enough with the code, but I assumed that file entities would use the full file path as @id.

@elichad elichad added the bug Something isn't working label Aug 15, 2024
@elichad
Copy link
Contributor

elichad commented Aug 15, 2024

@mashehu and I agreed on the WRROC call that this is a bug.

As a workaround, you should be able to use the dest_path argument to manually set the intended file path within the crate, and therefore the @id:

crate.add_file(fn, dest_path=fn, properties={"programmingLanguage": {"@id": "#nextflow"}})

Also, you might like to look at the add_workflow() function which could save you a few lines elsewhere. Something like:

crate.add_workflow(                # sets @type and conformsTo according to Workflow RO-Crate spec
    fn, 
    dest_path=fn, 
    main=True,                     # sets the added workflow as main entity
    properties={ . . . },                
    lang="nextflow"                # adds the #nextflow entity automatically and connects it to programmingLanguage
    lang_version="X.Y.Z",          # sets version on #nextflow
) 

@mashehu
Copy link
Contributor Author

mashehu commented Aug 16, 2024

Thanks @elichad, specifying dest_path works.

add_workflow() is indeed helpful, is it documented somewhere?

@elichad
Copy link
Contributor

elichad commented Aug 19, 2024

add_workflow() is not documented yet - I myself only learned about it very recently, and raised #186 to flag that we need to fix that gap in the documentation

@simleo
Copy link
Collaborator

simleo commented Sep 3, 2024

mashehu and I agreed on the WRROC call that this is a bug.

As a workaround, you should be able to use the dest_path argument to manually set the intended file path within the crate, and therefore the @id

It's not a bug, and using dest_path is not a workaround, but the normal way to set the data entity's path / id within the crate, which is always relative to the crate's root directory (which is not known until the crate is written).

When dest_path is not specified, the default behavior is to place the data entity (file or directory) at the top level, thus setting the id to the basename. Using the source path as the destination path does not work in general:

  • The source could be an absolute path (indeed, if you try to pass an absolute path as dest_path the method throws a ValueError)
  • The source could be an absolute URI, with fetch_remote set to True (in this case the file is downloaded and added to the crate as a local file)
  • The source could be a StringIO or BytesIO (this is supported), in which case there is no source path

If you want to check exactly what happens, take a look at FileOrDir's __init__ method and File's write method

A usage example can be found in repo2rocrate.

Usage of dest_path is demonstrated at the beginning of the README, in Creating an RO-Crate. I will review the documentation to see if I can better clarify how things work.

@simleo simleo added documentation Improvements or additions to documentation and removed bug Something isn't working labels Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants