Create custom database #6

jacodela · 2018-06-20T07:09:38Z

I'm interested in mapping metagenome reads to genome bins I've previously assembled and are not available in public databases. The documentation regarding the creation of a custom databases is limited to subsetting the provided refseq representative database given a genome with a know accession number, but I can't seem to find how to create a truly custom database. Is this even possible?

palomo11 · 2019-04-18T18:18:44Z

I have the same question. @jacodela Did you figure out how to do it?

jacodela · 2019-04-19T09:04:16Z

Hi @palomo11, I never got an answer, nor I figured out how to do it by myself, so I used other tools. I would recommend you take a look at Bracken or (meta)Kallisto: they run quite fast and perform well in some tests I ran myself on synthetic communities. If you have NCBI taxIDs, go for Bracken, otherwise, check Kallisto

jfy133 · 2020-12-17T12:46:35Z

@palomo11 @jacodela I know this is a very old thread to bring up, but given the author doesn't seem to have replied, I leave this as a possible response:

I saw in the toy dataset the following commands

#!/bin/bash
echo ':::: Creating an empty database with a name "toyset"'
    sparse init --dbname toyset

echo ':::: Filling database "toyset" with 22 Salmonella complete genomes'
    sparse index --dbname toyset --seqlist Salmonella_toyset.txt

echo ':::: Building a mapping database named "Salmonella" in "toyset"'
    sparse query --dbname toyset --tag m==a | sparse mapDB --dbname toyset --mapDB Salmonella --seqlist stdin

The crucial thing I think is the --seqlist Salmonella_toyset.txt flag. This is simply the RefSeq TSV file you can download from the NCBI FTP: https://github.com/zheminzhou/SPARSE/blob/master/example/Salmonella_toyset.txt.

Presumably SPARSE will read this file to look for the location and file name. I'm guessing you could be able to 'fake' info for 'custom' genomes and as long as it follows the same column format as the RefSeq file.

Note I'm assuming this, have not tried it myself.

EDIT: looking at the output it does have NCBI taxonomy info (and downloads the NCBI taxonomy dump), however the clusters seem to be independent of this, so 'faking' the genomes might still work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create custom database #6

Create custom database #6

jacodela commented Jun 20, 2018

palomo11 commented Apr 18, 2019

jacodela commented Apr 19, 2019 •

edited

Loading

jfy133 commented Dec 17, 2020 •

edited

Loading

Create custom database #6

Create custom database #6

Comments

jacodela commented Jun 20, 2018

palomo11 commented Apr 18, 2019

jacodela commented Apr 19, 2019 • edited Loading

jfy133 commented Dec 17, 2020 • edited Loading

jacodela commented Apr 19, 2019 •

edited

Loading

jfy133 commented Dec 17, 2020 •

edited

Loading