README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.rmd. Please edit that file -->

```{r, echo=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/"
)
library(plt)
```

# plt

Persistence landscapes are a vectorization of persistence data/diagrams that have useful statistical properties including linearity and an inner product.[^pl]
This is an R package interface to a C++ library to efficienctly compute and calculate with persistence landscapes.[^plt]

[^pl]: Bubenik P (2015) "Statistical Topological Data Analysis using Persistence Landscapes". _Journal of Machine Learning Research_ **16**(3):77--102. <https://jmlr.csail.mit.edu/papers/v16/bubenik15a.html>
[^plt]: Bubenik P & Dłotko P (2017) "A persistence landscapes toolbox for topological statistics". _Journal of Symbolic Computation_ **78**(1):91--114. <https://www.sciencedirect.com/science/article/pii/S0747717116300104>

## Installation

Until the package is on CRAN, use **pak** to install the package from the GitHub repository as follows:

```{r, eval=FALSE}
install.packages("pak")
pak::pkg_install("corybrunson/plt")
```

Alternatively---and especially if you want to contribute---you can clone or download the code repository and, from within the directory, install the package from source:

```{r, eval=FALSE}
devtools::install()
```

You should now be able to load the package normally from an R session:

```{r, eval=FALSE}
library(plt)
```

## Quickstart Guide

The **plt** package supports various operations involving persistence landscapes:

* Compute persistence landscapes from persistence data
* Plot persistence landscapes
* Perform Hilbert space operations (scaling, addition, inner product) plus some additional queries and transformations (extremal values, absolute value, integration) on persistence landscapes
* Conduct hypothesis tests on samples of persistence landscapes
* Vectorize persistence landscapes for other purposes including machine learning

Examples and tests in **plt** rely on other packages to simulate data and to compute persistence diagrams from data:

* **tdaunif** provides functions to sample uniformly from various immersed manifolds.
* **ripserr** and **TDA** provide functions to compute persistence data from point clouds and distance matrices.

**plt** introduces the 'Rcpp_PersistenceLandscape' S4 class, which is exposed using **Rcpp** from the underlying 'PersistenceLandscape' C++ class.
Instances of this class can be created using `new()` but the recommended way is to use `pl_new()`.
This function accepts either a single matrix of persistence data or a specially formatted list with the class `'persistence_diagram"`.
The `$pairs` entry of the list is itself a list, of a 2-column matrix of persistence pairs for each homological degree from 0 (`$pairs[[1]]`) to the maximum degree calculated.
The generic converter `as_persistence()` includes methods for outputs from `ripserr::vietoris_rips()` and from `TDA::*Diag()`; it operates under the hood of `pl_new()`, but we invoke it explicitly here for illustration.

### Calculation

To begin an illustration, we noisily sample 60 points from a figure eight and compute the persistence diagram of the point cloud:

```{r}
set.seed(513611L)
pc <- tdaunif::sample_lemniscate_gerono(60, sd = .1)
plot(pc, asp = 1, pch = 16L)
pd <- ripserr::vietoris_rips(pc, dim = 1, threshold = 2, p = 2)
print(pd)
```

We the convert the persistence data to the preferred persistence diagram format and inspect some of its features:

```{r}
pd <- as_persistence(pd)
print(pd)
print(head(pd$pairs[[1]]))
print(head(pd$pairs[[2]]))
```

This allows us to compute a persistence landscape---in this case, for the 1-dimensional features.
Here we compute the landscape exactly, which can be cost-prohibitive for larger persistence data, and print its summary:

```{r}
pl1 <- pl_new(pd, degree = 1, exact = TRUE)
print(pl1)
summary(pl1)
```

Some advanced concepts like the magnitude of a landscape will be explained below.

### Class

The object `pl1` is not an array, but rather an object that encapsulates both the data that encode a landscape and several basic operations that can be performed on it.
This allows us to work with persistence landscapes without worrying about pre-processing their representations.
At any point, the underlying encoding of the landscape can be extracted using `$getInternal()`, which in the case of an exactly calculated landscape returns a list of 2-column matrices, each matrix containing coordinates that define one level of the landscape as a piecewise linear function:

```{r}
print(length(pl1$getInternal()))
print(pl1$getInternal())
```

An alternative, approximate construction computes the value of each level of the landscape at each point on a 1-dimensional grid, ranging from `xmin` to `xmax` at increments of `xby`.
A landscape constructed using a discrete approximation is stored as a 3-dimensional array of dimensions (levels, values, 2), with one level per feature (some of which may be trivial) and one value per grid point, stored as $x,y$ pairs along the third dimension.

```{r}
b_ran <- pl_support(pl1)
pl1d <- pl_new(pd, degree = 1,
               xmin = b_ran[[1L]], xmax = b_ran[[2L]], xby = 0.05)
print(dim(pl1d$getInternal()))
print(pl1d$getInternal())
```

Exactly computed landscapes can be converted to discrete landscape objects, but the other direction is not well-defined.
Below, we view a portion of the discretized exact landscape, using the default bounds and resolution given to `pl1`:

```{r}
# default conversion to discrete uses `xby = 0.001`
pl1_ <- pl1$discretize()
print(dim(pl1_$getInternal()))
# print first 12 x-coordinates
pl1_$getInternal()[, seq(230L, 270L), , drop = FALSE]
```

We can also specify the bounds and the resolution of the discretization:

```{r}
pl1 <- pl_delimit(pl1, xmin = 0, xmax = 1, xby = 0.1)
pl1_ <- pl_discretize(pl1)
pl1_$getInternal()
```

### Visualization

**plt** provides a `plot()` method for the 'Rcpp_PersistenceLandscape' class.
It uses **grDevices** to build color palettes, and as such its default palette is viridis; but the user may supply the name of a recognized palette or a sequence of colors between which to interpolate:

```{r}
n_levs <- max(pl_num_levels(pl1), pl_num_levels(pl1d))
par(mfrow = c(2L, 1L), mar = c(2, 2, 0, 2))
plot(pl1, palette = "terrain", n_levels = n_levs, asp = 1)
plot(pl1d, palette = "terrain", n_levels = n_levs, asp = 1)
par(mfrow = c(1L, 1L), mar = c(5.1, 4.1, 4.1, 2.1))
```

### Hilbert Space Operations

To illustrate these features, we first generate a companion data set:

```{r}
# a new landscape and its discretization
set.seed(772888L)
pc2 <- tdaunif::sample_circle(60, sd = .1) / 2
pd2 <- ripserr::vietoris_rips(pc2, dim = 1, threshold = 2, p = 2)
pl2 <- pl_new(pd2, degree = 1, exact = TRUE)
pl2 <- pl_delimit(pl2, xmin = 0, xmax = 2, xby = 0.1)
pl2_ <- pl_discretize(pl2)
```

Several infix operators have been taught to work in the natural way with landscapes, so that users can explore vector space operations and inner products more conveniently:

```{r}
par(mfcol = c(3L, 2L), mar = c(2, 2, 0, 2))
# vector space operations on exact landscapes
plot(pl1 * 2)
plot(pl2)
plot(pl1 * 2 + pl2)
# vector space operations on discrete landscapes
plot(pl1_)
plot(-pl2_)
plot(2 * pl1_ - pl2_)
par(mfrow = c(1L, 1L), mar = c(5.1, 4.1, 4.1, 2.1))
# inner products of exact and discrete landscapes
pl1 %*% pl2
pl1_ %*% pl2_
pl1 %*% pl2_
```

(Note that the landscapes are automatically delimited to a compatible domain.)

### Calculus

The `summary()` method above reported the magnitude and the integral of the persistence landscape `pl1`.
The magnitude is the inner product of `pl1` with itself: `pl1 %*% pl1 = `r pl1 %*% pl1``.
Meanwhile, the integral is the (signed) area under the curve, itself also a linear operator:

```{r}
pl_integrate(pl1)
pl_integrate(pl2)
# 1-integral obeys linearity
pl_integrate(pl1) * 2 - pl_integrate(pl2)
pl_integrate(pl1 * 2 - pl2)
```

The distance between two landscapes is defined in terms of the integral of their absolute difference for finite norms and the maximum pointwise distance for the infinite norm.
Note that, because `pl_integrate()` defaults to the 1-norm and `pl_distance()` defaults to the 2-norm, we must be careful when comparing their results:

```{r}
# using the 1-norm
pl_integrate(pl_abs(pl1 * 2 - pl2), p = 1)
pl_distance(pl1 * 2, pl2, p = 1)

# using the 2-norm
pl_integrate(pl_abs(pl1 * 2 - pl2), p = 2) ^ (1/2)
pl_distance(pl1 * 2, pl2, p = 2)

# using the infinity norm
pl_vmax(pl_abs(pl1 * 2 - pl2))
pl_distance(pl1 * 2, pl2, p = Inf)
```

The norm of a persistence landscape is then defined as its distance from the null landscape that is constant at zero.

```{r}
# null landscape
pd0 <- data.frame(start = double(0L), end = double(0L))
pl0 <- pl_new(pd0, degree = 1, exact = TRUE)
pl_distance(pl1, pl0)
pl_norm(pl1)
```

### Statistical Analysis

Finally, **plt** implements the two hypothesis tests described in the original paper.
To illustrate, we first generate lists of landscapes for samples from two spaces:

```{r}
# samples of landscapes from lemniscates
pl1s <- replicate(6, {
  pc <- tdaunif::sample_lemniscate_gerono(60, sd = .1)
  pd <- ripserr::vietoris_rips(pc, dim = 1, threshold = 2, p = 2)
  pl_new(pd, degree = 1, xby = .01)
})
# samples of landscapes from circles
pl2s <- replicate(8, {
  pc <- tdaunif::sample_circle(60, sd = .1) / 2
  pd <- ripserr::vietoris_rips(pc, dim = 1, threshold = 2, p = 2)
  pl_new(pd, degree = 1, xby = .01)
})
```

An inspection of the mean landscapes makes clear that they are distinct:

```{r}
# average landscape from each sample
par(mfcol = c(2L, 1L), mar = c(2, 2, 0, 2))
plot(pl_mean(pl1s))
plot(pl_mean(pl2s))
par(mfrow = c(1L, 1L), mar = c(5.1, 4.1, 4.1, 2.1))
```

However, the two hypothesis tests use different procedures and rely on different test statistics, so one may be more effective than another.
For convenience, both methods return objects of class `'htest'` for convenient printing:

```{r}
# z-test of difference in integrals of first level
pl_z_test(pl1s, pl2s)
# permutation test of pairwise distances between landscapes
pl_perm_test(pl1s, pl2s)
```

## Acknowledgments

### Precursors

The C++ library is adapted from [Paweł Dłotko's Persistence Landscape Toolbox](https://www2.math.upenn.edu/~dlotko/persistenceLandscape.html).
It was originally adapted and ported to R in [Jose Bouza's **tda-tools** package](https://github.com/jjbouza/tda-tools).

### Resources

Development of this package benefitted from the use of equipment and the support of colleagues at the [University of Florida](https://www.ufl.edu/), especially [Peter Bubenik's research group](https://people.clas.ufl.edu/peterbubenik/researchgroup/) and the [Laboratory for Systems Medicine](https://systemsmedicine.pulmonary.medicine.ufl.edu/).

### Contribute

Bug reports, unit tests, documentation, use cases, feature suggestions, and other contributions are welcome. See the [CONTRIBUTING](https://github.com/corybrunson/plt/blob/main/CONTRIBUTING.md) file for guidance, and please respect the [Code of Conduct](https://github.com/corybrunson/plt/blob/main/CODE_OF_CONDUCT.md).

### Cite

If you use **plt** in published work, please include a citation following `citation("plt")`.