README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# ptmismatch <img src='man/figures/logo.png' align="right" height="139" />

<!-- badges: start -->
<!-- badges: end -->

The goal of ptmismatch (primer-template mismatch) is to find and summarize primer binding sites within
a given set of sequences to estimate priming efficiency during a
polymerase chain reaction (PCR).

## Installation

You can install the development version of ptmismatch from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("medvir/ptmismatch")
```

## Example

To find and list all primer-template matches, ptmismatch contains the function
`summarize_matches()`.
Load the package and open the documentation as follows:

```{r example-help}
library(ptmismatch)

?summarize_matches()
```

A primer (or pattern) and template (or subject) sequence is required. The primer sequence can 
be provided as string. [IUPAC ambiguity codes](https://www.bioinformatics.org/sms/iupac.html) 
are supported (e.g. "R" matches "A" and "G"). "I" are not supported, replace them with "N" instead.  

The template sequence(s) can be provided in form of a fasta or fastq file.
Ns in the template count as mismatch but all other ambiguity codes do not.

```{r example-pattern-subject}
EV_fwd <- "GCTGCGYTGGCGGCC"

EV_sequences_fasta <- system.file("extdata", "Enterovirus_12059.fasta",
                                  package = "ptmismatch")
```

Search for the pattern:

```{r example-summary, message=FALSE, warning=FALSE}
matches_summary <-
  summarize_matches(pattern = EV_fwd, subject_filepath = EV_sequences_fasta,
                    subject_format = "fasta", max.mismatch = 1, with.indels = TRUE)

matches_summary
```

The matches found can then be visualised for example as a sequence logo plog
using the ggseqlogo package:

```{r example-summary-logoplot, warning=FALSE, fig.asp=0.3, out.width="80%"}
ggseqlogo::ggseqlogo(matches_summary$matched_aligned, method = "bits",
                     seq_type = "dna")
```

In this example it nicely shows that the `Y` at position 7 of the primer is 
justified by the occurrence of `T` and `C` at that position.

## Limitations
The columns `mismatches_index` and `mismatches_n` are only included if `with.indels = FALSE`.