-
Notifications
You must be signed in to change notification settings - Fork 436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bwa-mem2 data manager #6376
Conversation
402d226
to
cb911d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I persuade you to drop the python wrapper? If there is non much logic involved datamanagers can (and maybe should) be much simpler.
An example might be https://github.com/galaxyproject/tools-iuc/blob/main/data_managers/data_manager_metaphlan_database_downloader/data_manager/data_manager_metaphlan_download.xml
Edit: Besides being simpler we also would not need the mulled container because we con drop the python requirement.
Thanks, I didn't know there were already examples that dropped the wrapper, I'll convert it. |
<tables> | ||
<!-- Locations of indexes in the BWA-MEM2 mapper format--> | ||
<table name="bwa_mem2_indexes" comment_char="#"> | ||
<columns>value, dbkey, name, path</columns> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to include the bwa-mem2 version here (that has been used for the generation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index format has been stable since 2020 and the Galaxy version of the tool was added after that index format change, so it should be compatible with all versions. For tools that have changed it in the past (e.g. STAR) we added a new versioned data table once that happens, but I guess if we want to be safe I could add a version column, even if it's unused for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a version column to all DM. Its just more future-proof and does not cost anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but wouldn’t adding a column make it incompatible with the existing bwa-mem2 tool’s data table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. I rephrase it, ideally, we should have a version column. Lets get it in for now without the version column and hope it will never change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think this is ready to go.
"value": "${value}", | ||
"dbkey": "${all_fasta_source.fields.dbkey}", | ||
"name": "${name}", | ||
"path": "${fasta_file_name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ${out_file.extra_files_path}/${fasta_file_name}
?
In the DMs that I created recently I just crated a directory (e.g. DIR
) in the working dir and used this. I guess using out_file.extra_files_path
is not really necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing so seems to cause the index to be referenced from the object store after the DM completes: the path in the loc file points to /path/to/galaxy/database/objects/c/0/f/dataset_c0ffee_files/fooBar1.fa
. Whereas if it is just the relative name inside the extra_files_path
, then the index is referenced from tool_data_dir
as expected for a DM. Possibly just an artifact of os.path.join([tool_data_dir, intermediate_dirs, '/path/to/galaxy/database/objects/c/0/f/dataset_c0ffee_files/fooBar1.fa'])
clobbering the earlier path elements, but either way that's not the desired outcome.
Great. Thanks a lot for the contribution. |
Thanks @bernt-matthias @bgruening! |
FOR CONTRIBUTOR: