Major bugfixes:
- Fix a major bug in sort that previously broke the sorting order. This bug was introduced in recent versions of pairtools #230
- Fix a major bug in dedup that caused pair duplication and broken sorting order in non-Cython backends
New features:
- stats: calculate the distance of P(s) divergence between pairs of different directionalities #222
- dedup: allow column names in all backends, and allow sorting by arbitrary columns #162
New behavior and default settings:
- dedup: turn mark-dups on by default #211
- parse: change the default --walks-policy to 5unique
- parse: pair types are now always in upper case. Previously, letters in pair types were converted to lowercase if the corresponding side contained chimeric alignments.
Minor bugfixes:
- dedup: allow inputs with quotes #194
- dedup: allow empty input pairs file #201
- stats: minor bugfixes #200
Documentation:
- a new notebook with the statistics of distances between PCR duplicates #233
- clean up phase walkthrough #218
- a new chapter on building workflows with pairtools #219 #226 #231
- a major cleanup
Code updates:
- make pairsio.py to read (and, in the future, write) .pairs files #195
- minor refactoring of parse #223
New Contributors:
- @hkariti made their first contribution in #194
-
pairtools dedup
: update default chunksize to 10,000 to prevent memory overflow on datasets with high duplication rate
-
pairtools select
regex update (string substitutions failed when the column name was a substring of another) -
Warnings capture in dedup: pairs lines are always split after rstrip newline
-
Important fixes of splitting schema
-
Dedup comment removed (failed when the read qualities contained "#")
-
Remove dbist build out of wheel
-
pairtools scaling: fixed an issue with scaling maximum range value #150 (comment)
-
Fixed issue with pysam dependencies on pip and conda
-
pytest test engine instead of nose
-
Small fixes in teh docs and scaling
This is a major release of pairtools since last release (April 2019!)
- sphinx docs update with incorporated walkthroughs
- parse2 module with CLI for parsing complex walks
- scaling and header modules with CLI
pairtools dedup
- finalize detection of optical duplicates #106 and #59, also related to #54
- chunked dedup by @Phlya
- improvement of dedup to include reporting of the parent readID by @Phlya and @agalitsyna
pairtools stats/scaling
- split dedup stats and regular stats
- output chromosome size to the stats output #83
- pairtools stats: YAML output? #111 and #79
- pairtools scaling tool which takes into account chromosome sizes: #81, #56?
pairtools parse
- parse complex walks engine and tools: #109
- stdin and stdout reporting defaults: #48
- flipping issue: #91
pairtools phase
- make work with both pip and github versions of bwa: #114
pairtools restrict
- Handle empty pairs with "!" chromosomes: #76
- Problem with restriction sites header/first rfrag: #73
- Suggestions by @golobor: #16
pairtools merge
Headers maintenance
- allow adding a header to a headerless file #119 or broader addition of the headed module, draft: #121
Code maintenance
- transfer pairlib into sandbox of pairtools lib
- separate cli and lib
- Remove OrderedDict: #113
- Clean up deprecation warnings, e.g. #71
- Fix input errors without explanations, e.g. #61
Docs improvements
- pairtools walkthrough
- phasing walkthrough
- parse docs update
Tests proposals
Enhancements
- add summaries: #105
- support of bwa mem2, which is 2-3 times faster than usual bwa mem: #118
- I/O single utility instead of repetitive code in each module
- sample: a new tool to select a random subset of pairs
- parse: add --readid-transform to edit readID
- parse: add experimental --walk-policy all (note: it will be moved to a separate tool in future!)
- all tools: use bgzip if pbgzip not available
Internal changes:
- parse: move most code to a separate _parse module
- _headerops: add extract_chromosomes(header)
- all tools: drop py3.5 support
- switch from travis CI to github actions
- parse: tag pairs with missing FASTQ/SAM on one side as corrupt, pair type "XX"
- sort: enable lz4c compression of sorted chunks by default
- automatically convert mapq1 and mapq2 to int in
select
- add the
flip
tool
- Bugfix: include _dedup.pyx in the Python package
- First release.