Skip to content

Commit

Permalink
v0.9.3
Browse files Browse the repository at this point in the history
  • Loading branch information
shenwei356 committed Jul 16, 2023
1 parent e8ad50e commit 06376bf
Show file tree
Hide file tree
Showing 6 changed files with 52 additions and 24 deletions.
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Changelog

### v0.9.3 - 2023-05-16
### v0.9.3 - 2023-07-16

- `kmcp compute/split-genomes`:
- fix a bug in chunk computation when splitting circular genomes (`--circular`).
Expand All @@ -18,6 +18,10 @@
20:00:55.295 [INFO] 99.3084% (923820/930254) reads matched
20:00:55.295 [INFO] 100.0000% (923820/923820) matched reads belong to the 2 references in the profile

- new tutorials:
- [Detecting specific pathogens](https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens)
- [Detecting contaminated sequences](https://bioinf.shenwei.me/kmcp/tutorial/detecting-contaminated-seqs)

### v0.9.2 - 2023-05-16

- `kmcp profile/cos2simi/filter/index-info/merge-regions/query-fpr`:
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ https://bioinf.shenwei.me/kmcp
- Tutorials
- [Taxonomic profiling](https://bioinf.shenwei.me/kmcp/tutorial/profiling)
- [Detecting specific pathogens](https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens)
- [Detecting contaminated sequences](https://bioinf.shenwei.me/kmcp/tutorial/detecting-contaminated-seqs)
- [Sequence and genome searching](https://bioinf.shenwei.me/kmcp/tutorial/searching)
- [Usage](https://bioinf.shenwei.me/kmcp/usage)
- [Benchmarks](https://bioinf.shenwei.me/kmcp/benchmark)
Expand Down
2 changes: 1 addition & 1 deletion docs/database.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ Mapping file:
no rank 31
isolate 26

Masking prophage regions and removing plasmid sequences (optional):
Masking prophage regions and removing plasmid sequences with [genomad](https://github.com/apcamargo/genomad) (optional):

conda activate genomad

Expand Down
45 changes: 33 additions & 12 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,26 +17,37 @@ ARM architecture is supported, but `kmcp search` would be slower.

## Current Version

### [v0.9.2](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2) - 2023-05-16 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.2/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2)
### [v0.9.3](https://github.com/shenwei356/kmcp/releases/tag/v0.9.3) - 2023-07-16 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.3/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.3)

- `kmcp compute/split-genomes`:
- fix a bug in chunk computation when splitting circular genomes (`--circular`).
- `kmcp search/merge`:
- append simple stats to the search result as comment lines, including the number of input and matched queries. e.g.,

# input queries: 930254
# matched queries: 923820
# matched percentage: 99.3084%

- `kmcp profile/cos2simi/filter/index-info/merge-regions/query-fpr`:
- **rename/unify the long flag `--out-prefix` to `--out-file`**.
- `kmcp profile`:
- fix the number of reads belonging to references in the profile when no matches are found, which should be 0 instead of 1.
- new command:
- `kmcp utils index-density`: plotting the element density of bloom filters for an index file.
An audience was concerned about it, but the results showed the elements (1s) are uniformly distributed in all BFs.
- fix metaphlan out format. [#34](https://github.com/shenwei356/kmcp/issues/34)
- show stats of the number of input and matched queries in log. It would be helpful to hint at whether the reference genomes cover all microorganisms in the sample.

20:00:55.295 [INFO] 99.3084% (923820/930254) reads matched
20:00:55.295 [INFO] 100.0000% (923820/923820) matched reads belong to the 2 references in the profile

- new tutorials:
- [Detecting specific pathogens](https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens)
- [Detecting contaminated sequences](https://bioinf.shenwei.me/kmcp/tutorial/detecting-contaminated-seqs)

### Links

OS |Arch |File, 中国镜像 |Download Count
:------|:---------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Linux |**64-bit**|[**kmcp_linux_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_amd64.tar.gz)
Linux |arm64 |[**kmcp_linux_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_arm64.tar.gz)
macOS |**64-bit**|[**kmcp_darwin_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_amd64.tar.gz)
macOS |arm64 |[**kmcp_darwin_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_arm64.tar.gz)
Windows|**64-bit**|[**kmcp_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_windows_amd64.exe.tar.gz)
Linux |**64-bit**|[**kmcp_linux_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_amd64.tar.gz)
Linux |arm64 |[**kmcp_linux_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_arm64.tar.gz)
macOS |**64-bit**|[**kmcp_darwin_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_amd64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_amd64.tar.gz)
macOS |arm64 |[**kmcp_darwin_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_arm64.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_arm64.tar.gz)
Windows|**64-bit**|[**kmcp_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_windows_amd64.exe.tar.gz), <br/> [中国镜像](http://app.shenwei.me/data/kmcp/kmcp_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_windows_amd64.exe.tar.gz)

*Notes:*

Expand Down Expand Up @@ -137,6 +148,16 @@ fish:

## Release History

### [v0.9.2](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2) - 2023-05-16 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.2/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2)

- `kmcp profile/cos2simi/filter/index-info/merge-regions/query-fpr`:
- **rename/unify the long flag `--out-prefix` to `--out-file`**.
- `kmcp profile`:
- fix the number of reads belonging to references in the profile when no matches are found, which should be 0 instead of 1.
- new command:
- `kmcp utils index-density`: plotting the element density of bloom filters for an index file.
An audience was concerned about it, but the results showed the elements (1s) are uniformly distributed in all BFs.

### [v0.9.1](https://github.com/shenwei356/kmcp/releases/tag/v0.9.1) - 2022-12-26 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.1/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.1)

- `kmcp search`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## tools

- kmcp: https://github.com/shenwei356/kmcp
- seqkit: >= v2.5.0, https://github.com/shenwei356/seqkit/issues/390#issuecomment-1633495130
- seqkit: >= v2.5.0 which has the new command `seqkit merge-slides`.
- taxonkit: https://github.com/shenwei356/taxonkit
- csvtk: https://github.com/shenwei356/csvtk

Expand All @@ -18,7 +18,7 @@
## hardware

- RAM >= 64GB
- #CPUs >= 32 preferred.
- CPUs >= 32 preferred.

## steps

Expand All @@ -32,12 +32,12 @@ and performing metagenomoic profiling with them.

# search against GTDB databases
# !!! if the KMCP databases are in a network-attached storage disk (NAS),
# !!! please add the flag "-w" to kmcp
# !!! please add the flag "-w" to "kmcp search"
seqkit sliding -g -s 50 -W 200 $input \
| kmcp search -d ~/ws/data/kmcp2023/gtdb.part_1.kmcp/ -o $input.kmcp@gtdb.part_1.tsv.gz
| kmcp search -w -d ~/ws/data/kmcp2023/gtdb.part_1.kmcp/ -o $input.kmcp@gtdb.part_1.tsv.gz

seqkit sliding -g -s 50 -W 200 $input \
| kmcp search -d ~/ws/data/kmcp2023/gtdb.part_2.kmcp/ -o $input.kmcp@gtdb.part_2.tsv.gz
| kmcp search -w -d ~/ws/data/kmcp2023/gtdb.part_2.kmcp/ -o $input.kmcp@gtdb.part_2.tsv.gz

# merge seach results
kmcp merge -o $input.kmcp.tsv.gz $input.kmcp@gtdb.part_*.tsv.gz
Expand Down Expand Up @@ -74,7 +74,7 @@ Checking contaminated regions
| sed 1d | head -n 1 | sed "s/;/\n/g") \
-o $input.kmcp.tsv.gz.binning.filtered.tsv

# merge regions
# merge regions. seqkit v2.5.0 is needed.
seqkit merge-slides $input.kmcp.tsv.gz.binning.filtered.tsv --quiet \
-o $input.kmcp.tsv.gz.cont.tsv

Expand All @@ -90,14 +90,16 @@ Checking contaminated regions
csvtk join -Ht $input.kmcp.tsv.gz.cont.tsv <(seqkit fx2tab -ni -l $input) \
| awk '{print $0"\t"($3-$2)"\t"($3-$2)/$4}' \
| csvtk join -Ht - $input.kmcp.tsv.gz.binning.filtered.tsv.taxa \
| csvtk add-header -Ht -n chr,begin,end,contig_len,len,frac,taxa \
| csvtk sort -t -k frac:nr \
| csvtk add-header -Ht -n chr,begin,end,contig_len,len,proportion,taxa \
| csvtk sort -t -k proportion:nr \
| tee $input.kmcp.tsv.gz.cont.details.tsv \
| csvtk pretty -t

chr begin end contig_len len frac taxa
chr begin end contig_len len proportion taxa
------------------------ ------ ------ ---------- ---- ----------- --------------------------------------------------------
SAMN02360712.contig00044 0 1151 1151 1151 1 177416(Francisella tularensis subsp. tularensis SCHU S4)
SAMN02360712.contig00012 163600 163900 164357 300 0.00182529 1028746(Christiangramia aestuarii)
SAMN02360712.contig00008 64850 65150 279605 300 0.00107294 1028746(Christiangramia aestuarii)
SAMN02360712.contig00002 362200 362500 622965 300 0.000481568 1028746(Christiangramia aestuarii)

We can see the whole (proportion: 1) contig `SAMN02360712.contig00044` is from a totally different species, which should be a contaminated sequence.
2 changes: 1 addition & 1 deletion docs/tutorial/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@

- [Taxonomic profiling](profiling)
- [Detecting specific pathogens](detecting-pathogens)
- [Detecting contaminated sequences](detect-contaminated-seqs)
- [Detecting contaminated sequences](detecting-contaminated-seqs)
- [Sequence and genome searching](searching)

0 comments on commit 06376bf

Please sign in to comment.