Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bwa-mem2 data manager #6376

Merged
merged 2 commits into from
Sep 30, 2024
Merged

Conversation

natefoo
Copy link
Member

@natefoo natefoo commented Sep 26, 2024

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection

Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I persuade you to drop the python wrapper? If there is non much logic involved datamanagers can (and maybe should) be much simpler.

An example might be https://github.com/galaxyproject/tools-iuc/blob/main/data_managers/data_manager_metaphlan_database_downloader/data_manager/data_manager_metaphlan_download.xml

Edit: Besides being simpler we also would not need the mulled container because we con drop the python requirement.

@natefoo
Copy link
Member Author

natefoo commented Sep 27, 2024

Thanks, I didn't know there were already examples that dropped the wrapper, I'll convert it.

<tables>
<!-- Locations of indexes in the BWA-MEM2 mapper format-->
<table name="bwa_mem2_indexes" comment_char="#">
<columns>value, dbkey, name, path</columns>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to include the bwa-mem2 version here (that has been used for the generation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index format has been stable since 2020 and the Galaxy version of the tool was added after that index format change, so it should be compatible with all versions. For tools that have changed it in the past (e.g. STAR) we added a new versioned data table once that happens, but I guess if we want to be safe I could add a version column, even if it's unused for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a version column to all DM. Its just more future-proof and does not cost anything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but wouldn’t adding a column make it incompatible with the existing bwa-mem2 tool’s data table?

Copy link
Member

@bgruening bgruening Sep 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true. I rephrase it, ideally, we should have a version column. Lets get it in for now without the version column and hope it will never change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think this is ready to go.

"value": "${value}",
"dbkey": "${all_fasta_source.fields.dbkey}",
"name": "${name}",
"path": "${fasta_file_name}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ${out_file.extra_files_path}/${fasta_file_name}?

In the DMs that I created recently I just crated a directory (e.g. DIR) in the working dir and used this. I guess using out_file.extra_files_path is not really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing so seems to cause the index to be referenced from the object store after the DM completes: the path in the loc file points to /path/to/galaxy/database/objects/c/0/f/dataset_c0ffee_files/fooBar1.fa. Whereas if it is just the relative name inside the extra_files_path, then the index is referenced from tool_data_dir as expected for a DM. Possibly just an artifact of os.path.join([tool_data_dir, intermediate_dirs, '/path/to/galaxy/database/objects/c/0/f/dataset_c0ffee_files/fooBar1.fa']) clobbering the earlier path elements, but either way that's not the desired outcome.

@bernt-matthias bernt-matthias merged commit aee140e into galaxyproject:main Sep 30, 2024
11 checks passed
@bernt-matthias
Copy link
Contributor

Great. Thanks a lot for the contribution.

@natefoo
Copy link
Member Author

natefoo commented Sep 30, 2024

Thanks @bernt-matthias @bgruening!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants