diff --git a/CHANGELOG.md b/CHANGELOG.md
index 8c0baa8..5cf94e4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,6 @@
# Changelog
-### v0.9.3 - 2023-05-16
+### v0.9.3 - 2023-07-16
- `kmcp compute/split-genomes`:
- fix a bug in chunk computation when splitting circular genomes (`--circular`).
@@ -18,6 +18,10 @@
20:00:55.295 [INFO] 99.3084% (923820/930254) reads matched
20:00:55.295 [INFO] 100.0000% (923820/923820) matched reads belong to the 2 references in the profile
+- new tutorials:
+ - [Detecting specific pathogens](https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens)
+ - [Detecting contaminated sequences](https://bioinf.shenwei.me/kmcp/tutorial/detecting-contaminated-seqs)
+
### v0.9.2 - 2023-05-16
- `kmcp profile/cos2simi/filter/index-info/merge-regions/query-fpr`:
diff --git a/README.md b/README.md
index 5d1eb76..0368529 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,7 @@ https://bioinf.shenwei.me/kmcp
- Tutorials
- [Taxonomic profiling](https://bioinf.shenwei.me/kmcp/tutorial/profiling)
- [Detecting specific pathogens](https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens)
+ - [Detecting contaminated sequences](https://bioinf.shenwei.me/kmcp/tutorial/detecting-contaminated-seqs)
- [Sequence and genome searching](https://bioinf.shenwei.me/kmcp/tutorial/searching)
- [Usage](https://bioinf.shenwei.me/kmcp/usage)
- [Benchmarks](https://bioinf.shenwei.me/kmcp/benchmark)
diff --git a/docs/database.md b/docs/database.md
index 79e75ef..fb41fcf 100644
--- a/docs/database.md
+++ b/docs/database.md
@@ -139,7 +139,7 @@ Mapping file:
no rank 31
isolate 26
-Masking prophage regions and removing plasmid sequences (optional):
+Masking prophage regions and removing plasmid sequences with [genomad](https://github.com/apcamargo/genomad) (optional):
conda activate genomad
diff --git a/docs/download.md b/docs/download.md
index 99069c9..0d77a89 100644
--- a/docs/download.md
+++ b/docs/download.md
@@ -17,26 +17,37 @@ ARM architecture is supported, but `kmcp search` would be slower.
## Current Version
-### [v0.9.2](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2) - 2023-05-16 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.2/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2)
+### [v0.9.3](https://github.com/shenwei356/kmcp/releases/tag/v0.9.3) - 2023-07-16 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.3/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.3)
+
+- `kmcp compute/split-genomes`:
+ - fix a bug in chunk computation when splitting circular genomes (`--circular`).
+- `kmcp search/merge`:
+ - append simple stats to the search result as comment lines, including the number of input and matched queries. e.g.,
+
+ # input queries: 930254
+ # matched queries: 923820
+ # matched percentage: 99.3084%
-- `kmcp profile/cos2simi/filter/index-info/merge-regions/query-fpr`:
- - **rename/unify the long flag `--out-prefix` to `--out-file`**.
- `kmcp profile`:
- - fix the number of reads belonging to references in the profile when no matches are found, which should be 0 instead of 1.
-- new command:
- - `kmcp utils index-density`: plotting the element density of bloom filters for an index file.
- An audience was concerned about it, but the results showed the elements (1s) are uniformly distributed in all BFs.
+ - fix metaphlan out format. [#34](https://github.com/shenwei356/kmcp/issues/34)
+ - show stats of the number of input and matched queries in log. It would be helpful to hint at whether the reference genomes cover all microorganisms in the sample.
+
+ 20:00:55.295 [INFO] 99.3084% (923820/930254) reads matched
+ 20:00:55.295 [INFO] 100.0000% (923820/923820) matched reads belong to the 2 references in the profile
+- new tutorials:
+ - [Detecting specific pathogens](https://bioinf.shenwei.me/kmcp/tutorial/detecting-pathogens)
+ - [Detecting contaminated sequences](https://bioinf.shenwei.me/kmcp/tutorial/detecting-contaminated-seqs)
### Links
OS |Arch |File, 中国镜像 |Download Count
:------|:---------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-Linux |**64-bit**|[**kmcp_linux_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_amd64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_amd64.tar.gz)
-Linux |arm64 |[**kmcp_linux_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_arm64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_linux_arm64.tar.gz)
-macOS |**64-bit**|[**kmcp_darwin_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_amd64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_amd64.tar.gz)
-macOS |arm64 |[**kmcp_darwin_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_arm64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_darwin_arm64.tar.gz)
-Windows|**64-bit**|[**kmcp_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_windows_amd64.exe.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.2/kmcp_windows_amd64.exe.tar.gz)
+Linux |**64-bit**|[**kmcp_linux_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_amd64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_amd64.tar.gz)
+Linux |arm64 |[**kmcp_linux_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_arm64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_linux_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_linux_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_linux_arm64.tar.gz)
+macOS |**64-bit**|[**kmcp_darwin_amd64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_amd64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_amd64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_amd64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_amd64.tar.gz)
+macOS |arm64 |[**kmcp_darwin_arm64.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_arm64.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_darwin_arm64.tar.gz) |[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_darwin_arm64.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_darwin_arm64.tar.gz)
+Windows|**64-bit**|[**kmcp_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_windows_amd64.exe.tar.gz),
[中国镜像](http://app.shenwei.me/data/kmcp/kmcp_windows_amd64.exe.tar.gz)|[![Github Releases (by Asset)](https://img.shields.io/github/downloads/shenwei356/kmcp/latest/kmcp_windows_amd64.exe.tar.gz.svg?maxAge=3600)](https://github.com/shenwei356/kmcp/releases/download/v0.9.3/kmcp_windows_amd64.exe.tar.gz)
*Notes:*
@@ -137,6 +148,16 @@ fish:
## Release History
+### [v0.9.2](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2) - 2023-05-16 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.2/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.2)
+
+- `kmcp profile/cos2simi/filter/index-info/merge-regions/query-fpr`:
+ - **rename/unify the long flag `--out-prefix` to `--out-file`**.
+- `kmcp profile`:
+ - fix the number of reads belonging to references in the profile when no matches are found, which should be 0 instead of 1.
+- new command:
+ - `kmcp utils index-density`: plotting the element density of bloom filters for an index file.
+ An audience was concerned about it, but the results showed the elements (1s) are uniformly distributed in all BFs.
+
### [v0.9.1](https://github.com/shenwei356/kmcp/releases/tag/v0.9.1) - 2022-12-26 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/kmcp/v0.9.1/total.svg)](https://github.com/shenwei356/kmcp/releases/tag/v0.9.1)
- `kmcp search`
diff --git a/docs/tutorial/detect-contaminated-seqs/index.md b/docs/tutorial/detecting-contaminated-seqs/index.md
similarity index 87%
rename from docs/tutorial/detect-contaminated-seqs/index.md
rename to docs/tutorial/detecting-contaminated-seqs/index.md
index 96cce40..549c9f1 100644
--- a/docs/tutorial/detect-contaminated-seqs/index.md
+++ b/docs/tutorial/detecting-contaminated-seqs/index.md
@@ -3,7 +3,7 @@
## tools
- kmcp: https://github.com/shenwei356/kmcp
-- seqkit: >= v2.5.0, https://github.com/shenwei356/seqkit/issues/390#issuecomment-1633495130
+- seqkit: >= v2.5.0 which has the new command `seqkit merge-slides`.
- taxonkit: https://github.com/shenwei356/taxonkit
- csvtk: https://github.com/shenwei356/csvtk
@@ -18,7 +18,7 @@
## hardware
- RAM >= 64GB
-- #CPUs >= 32 preferred.
+- CPUs >= 32 preferred.
## steps
@@ -32,12 +32,12 @@ and performing metagenomoic profiling with them.
# search against GTDB databases
# !!! if the KMCP databases are in a network-attached storage disk (NAS),
- # !!! please add the flag "-w" to kmcp
+ # !!! please add the flag "-w" to "kmcp search"
seqkit sliding -g -s 50 -W 200 $input \
- | kmcp search -d ~/ws/data/kmcp2023/gtdb.part_1.kmcp/ -o $input.kmcp@gtdb.part_1.tsv.gz
+ | kmcp search -w -d ~/ws/data/kmcp2023/gtdb.part_1.kmcp/ -o $input.kmcp@gtdb.part_1.tsv.gz
seqkit sliding -g -s 50 -W 200 $input \
- | kmcp search -d ~/ws/data/kmcp2023/gtdb.part_2.kmcp/ -o $input.kmcp@gtdb.part_2.tsv.gz
+ | kmcp search -w -d ~/ws/data/kmcp2023/gtdb.part_2.kmcp/ -o $input.kmcp@gtdb.part_2.tsv.gz
# merge seach results
kmcp merge -o $input.kmcp.tsv.gz $input.kmcp@gtdb.part_*.tsv.gz
@@ -74,7 +74,7 @@ Checking contaminated regions
| sed 1d | head -n 1 | sed "s/;/\n/g") \
-o $input.kmcp.tsv.gz.binning.filtered.tsv
- # merge regions
+ # merge regions. seqkit v2.5.0 is needed.
seqkit merge-slides $input.kmcp.tsv.gz.binning.filtered.tsv --quiet \
-o $input.kmcp.tsv.gz.cont.tsv
@@ -90,14 +90,16 @@ Checking contaminated regions
csvtk join -Ht $input.kmcp.tsv.gz.cont.tsv <(seqkit fx2tab -ni -l $input) \
| awk '{print $0"\t"($3-$2)"\t"($3-$2)/$4}' \
| csvtk join -Ht - $input.kmcp.tsv.gz.binning.filtered.tsv.taxa \
- | csvtk add-header -Ht -n chr,begin,end,contig_len,len,frac,taxa \
- | csvtk sort -t -k frac:nr \
+ | csvtk add-header -Ht -n chr,begin,end,contig_len,len,proportion,taxa \
+ | csvtk sort -t -k proportion:nr \
| tee $input.kmcp.tsv.gz.cont.details.tsv \
| csvtk pretty -t
- chr begin end contig_len len frac taxa
+ chr begin end contig_len len proportion taxa
------------------------ ------ ------ ---------- ---- ----------- --------------------------------------------------------
SAMN02360712.contig00044 0 1151 1151 1151 1 177416(Francisella tularensis subsp. tularensis SCHU S4)
SAMN02360712.contig00012 163600 163900 164357 300 0.00182529 1028746(Christiangramia aestuarii)
SAMN02360712.contig00008 64850 65150 279605 300 0.00107294 1028746(Christiangramia aestuarii)
SAMN02360712.contig00002 362200 362500 622965 300 0.000481568 1028746(Christiangramia aestuarii)
+
+We can see the whole (proportion: 1) contig `SAMN02360712.contig00044` is from a totally different species, which should be a contaminated sequence.
diff --git a/docs/tutorial/index.md b/docs/tutorial/index.md
index ac8c370..6b51fc8 100644
--- a/docs/tutorial/index.md
+++ b/docs/tutorial/index.md
@@ -2,5 +2,5 @@
- [Taxonomic profiling](profiling)
- [Detecting specific pathogens](detecting-pathogens)
-- [Detecting contaminated sequences](detect-contaminated-seqs)
+- [Detecting contaminated sequences](detecting-contaminated-seqs)
- [Sequence and genome searching](searching)