Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/write invariance breaks with integer tags in BAM/CRAM #314

Open
athos opened this issue Jul 5, 2024 · 0 comments
Open

Read/write invariance breaks with integer tags in BAM/CRAM #314

athos opened this issue Jul 5, 2024 · 0 comments
Labels

Comments

@athos
Copy link
Member

athos commented Jul 5, 2024

When reading alignments from a BAM or CRAM file and writing them to another BAM/CRAM file as they are, the values of integer tags may change.

Repro

$ samtools view int_tag_overflow.bam
r1      4       *       0       0       *       *       0       0       ATGC    ####    XA:i:4294967295
(require '[cljam.io.sasm :as sam])

(with-open [r (sam/reader "int_tag_overflow.bam")]
  (doall (sam/read-alignments r)))
;=>
({:qname "r1",
  :flag 4,
  :rname "*",
  ...
  :seq "ATGC",
  :qual "####",
  :options ({:XA {:type "i", :value 4294967295}})})

(with-open [r (sam/reader "int_tag_overflow.bam")
            w (sam/writer "int_tag_overflow.rewrite.bam")]
  (sam/write-header w (sam/read-header r))
  (sam/write-refs w (sam/read-refs r))
  (sam/write-alignments w (sam/read-alignments r) (sam/read-header r)))

(with-open [r (sam/reader "int_tag_overflow.rewrite.bam")]
  (doall (sam/read-alignments r)))
;=>
({:qname "r1",
  :flag 4,
  :rname "*",
  ...
  :seq "ATGC",
  :qual "####",
  :options ({:XA {:type "i", :value -1}})})  ;; <- this value has changed from the original one

Cause

  • The SAM format defines the only integer tag type i (signed arbitrary-precision integer) while the BAM/CRAM format has the i integer tag type with different semantics (signed 32bit integer), as well as other integer types (c/C/s/S/I)
  • cljam's BAM/CRAM reader interprets any integer tag value as the i tag type
  • cljam's BAM/CRAM writer doesn't check if each integer tag value fits the specified tag type. It writes a tag value as the i tag type even if it can't be represented as a signed 32bit integer.
@athos athos added the bug label Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant