Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a read trimming step to alignment workflows #179

Open
a-frantz opened this issue Aug 27, 2024 · 2 comments
Open

Add a read trimming step to alignment workflows #179

a-frantz opened this issue Aug 27, 2024 · 2 comments

Comments

@a-frantz
Copy link
Member

Currently, our workflows assume read trimming has already occurred upstream, so we don't perform it as part of alignment. This assumption is often violated.

As part of this issue, we need to select a read trimming tool/algorithm (might require some comparative analysis) and then incorporate it into the *-core workflows. We also need to ensure there's no harm in read trimming FASTQs that have already been read trimmed.

If we opt to investigate multiple read trimming tools, we might as well write WDL tasks for all of them. It could be nice if users could select that as part of the workflow, however we may find that they are not all created equal and only one choice should be supported. TBD.

@mjgattas
Copy link

I'd love to take this one on! it sounds like from our conversation trimmomatic is the tool you've started investigating, but should I continue the comparative analysis?

@a-frantz
Copy link
Member Author

First step is just going to be getting a working WDL implementation of trimmomatic. We can discuss next steps after that's complete.

I've only skimmed the documentation, but looks like trimmomatic has two modes: a Single-End (SE) and a Paired-End (PE) mode. So we are going to want those each as their own WDL task. Dive into the documentation and expose as many of the parameters as you can. Make sure to copy and paste (with possible editorializing) any relevant bits of the documentation into the WDL meta sections. Our goal in terms of documentation is to provide an equivalent, if not enhanced, experience compared to reading the original docs. Check out the other task files to see how our documentation conventions and do your best to copy them.

I recommend installing the sprocket VSCode extension and using that for writing this. Enable lints and follow any directions from sprocket. (Except for ContainerValue and TrailingComma which we are currently ignoring in this repo)

Then grab some FASTQ files and start testing! Run your tasks using miniwdl (short guide here).

Lastly add some test coverage under the tests/ directory (should be clear how to do that from the existing tests).

Once all the above is looking good, you can ping me and @adthrasher to review the PR.

It would also be great if you could answer this question for us:

We also need to ensure there's no harm in read trimming FASTQs that have already been read trimmed.

For this just run the output through as input and check for differences. We hope there won't be any, but that needs to be investigated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants