ms.Rnw

\documentclass[10pt,leqno,final]{amsart}

\input{ms_header}

\title[Exact Phylodynamic Likelihood]{Exact Phylodynamic Likelihood\\ via Structured Markov Genealogy Processes}
\author[King]{Aaron~A.~King}
\address{
  A.~A.~King,
  Department of Ecology \& Evolutionary~Biology,
  Center for the Study of Complex~Systems, and
  Department of Mathematics,
  University of Michigan,
  Ann~Arbor, MI~48109~USA
  \emph{and} Santa~Fe~Institute,
  1399 Hyde~Park~Road,
  Santa~Fe, NM~87501~USA
}
\email{kingaa@umich.edu}
\urladdr{\href{https://kinglab.eeb.lsa.umich.edu/}{https://kinglab.eeb.lsa.umich.edu/}}
\author[Lin]{Qianying~Lin}
\address{
  Q.-Y.~Lin,
  Theoretical Biology and Biophysics,
  Los~Alamos National Laboratory,
  Los~Alamos, NM~87545~USA
}
\author[Ionides]{Edward~L.~Ionides}
\address{
  E.~L.~Ionides,
  Department of Statistics,
  University of Michigan,
  Ann~Arbor, MI~48109~USA
}
\date{\today}

\hypersetup{pdftitle={Exact Phylodynamic Likelihood via Structured Markov Genealogy Processes}}
\hypersetup{pdfauthor={A.A. King, Q.-Y. Lin, E.L. Ionides}}
\hypersetup{urlcolor=blue,citecolor=blue,linkcolor=blue,filecolor=blue}

<<prefix,include=FALSE,cache=FALSE,purl=FALSE>>=
prefix <- "sgp"
source("setup.R")
@
<<packages,include=FALSE,cache=FALSE>>=
library(tidyverse)
library(ggtree)
library(pomp)
library(cowplot)
library(viridis)
library(phylopomp)
stopifnot(getRversion() >= "4.4")
stopifnot(packageVersion("pomp")>="6.1")
stopifnot(packageVersion("phylopomp")>="0.14.8")
theme_set(theme_bw(base_family="serif"))
options(
  width=150,
  keep.source=TRUE,
  encoding="UTF-8",
  stringsAsFactors=FALSE,
  dplyr.summarise.inform=FALSE,
  pomp_archive_dir="results"
)
set.seed(1159254136)
@

\begin{document}

\begin{abstract}
  We consider genealogies arising from a Markov population process in which individuals are categorized into a discrete collection of compartments, with the requirement that individuals within the same compartment are statistically exchangeable.
  When equipped with a sampling process, each such population process induces a time-evolving tree-valued process defined as the genealogy of all sampled individuals.
  We provide a construction of this genealogy process and derive exact expressions for the likelihood of an observed genealogy in terms of filter equations.
  These filter equations can be numerically solved using standard Monte Carlo integration methods.
  Thus, we obtain statistically efficient likelihood-based inference for essentially arbitrary compartment models based on an observed genealogy of individuals sampled from the population.
\end{abstract}

\maketitle

\section{Introduction}

When the genome of an infectious agent accumulates mutations on timescales similar to those of transmission and infection progression, the resulting pattern of differences among genomes contains information on the history of the pathogen's passage through individual hosts and the host population.
As \citet{Grenfell2004} observed, one can extract this information to gain insight into the structure and dynamics of the host-pathogen system.
In particular, one can formalize mathematical models of transmission, estimate their parameters, and compare their ability to explain data, following standard statistical paradigms.
This is known as \emph{phylodynamic} inference.
\citet{Alizon2024} gives a good review of the history of the subject.

The most common approach to phylodynamic inference rests upon a mathematical linkage between the tree-like \emph{genealogy} or \emph{phylogeny} that expresses the relationships of shared ancestry among sampled genomes and a model of the dynamics of the transmission system.
Various linkages are possible, but because it is maximally efficient (\ie loses the least information), it is desirable to be able to compute the likelihood function for models of interest.
This is simply the probability density of a given genealogy conditional on a given model, viewed as a function of the parameters of that model.
%% Include other data, $Y$, in the expression?
In particular, if $S$ is a set of genome sequences, $\Phi$ a genealogical tree relating these sequences, $E$ a model of sequence evolution, and $D$ a dynamic transmission model, then the likelihood is
\begin{equation*}
  \lik(D,E) = f(S|D,E) = \int{f(S|\Phi,E)\,f(\Phi|D)\,\dd{\Phi}},
\end{equation*}
where the integral is taken over all possible genealogies and we somewhat loosely use the symbol $f$ for the various distinct probability densities, the nature of each of which is clear from its arguments.
In this expression, $f(S|\Phi,E)$ is typically the \citet{Felsenstein2004} phylogenetic likelihood.
The function $f(\Phi|D)$, which links the phylogeny to the dynamic model, may be termed the \emph{phylodynamic likelihood}.
In the Bayesian context, this same function is sometimes referred to as a \emph{tree prior} \citep{Moeller2018,Volz2018}.
The computation of this function has remained out of reach, except in several special cases.
This paper presents theory that enables its computation for a very broad range of dynamic models.

%% Phylodynamic inference rests upon mathematical linkage between mathematical models of disease transmission at the population level and genome-sequence data collected from individual infected hosts.
%% The tightest possible such linkage is effected by a computable likelihood function.
%% The tree-like \emph{genealogy} or \emph{phylogeny} that expresses the relationships of shared ancestry among the sequences.
%% In particular, if one can determine the likelihood of any given genealogy given a model and also the likelihood of the data given a genealogy, then the likelihood of the data given the model is obtained by summing over the space of genealogies.

Existing approaches to the phylodynamic likelihood have been based on one of two mathematical idealizations.
The first is the \citet{Kingman1982a} coalescent, by which likelihood of a given genealogy is computed using a reverse-time argument.
This computation provides the exact likelihood for a genealogy resulting from a particular, constant population-size, dynamic model \citep[the Moran model, \eg][]{Moran1958,Kingman1982b,Moehle2000}.
Extensions of this approach develop approximate likelihoods for the case when the population size varies as a function of time \citep{Griffiths1994,Drummond2005} or according to an SIR process \citep{Volz2009a,Rasmussen2011}, as long as the population size is large and the sample-fraction remains negligible.
The second idealization is the linear birth-death process, for which exact expressions for the likelihood are available \citep{Stadler2010}.
Linearity in this context amounts to the assumption that distinct lineages do not interact:
it is the resulting self-similarity of genealogies that renders the likelihood analytically tractable.
Extensions of this approach develop approximations via linearization of nonlinear processes or restriction to scenarios in which population growth is nearly linear \citep[\eg][]{MacPherson2021}.
Although the tractability of these approaches makes them attractive, concern naturally arises as to validity of the approximations in specific cases, the biases introduced by them, and the amount of information in data left unutilized by these approximate methods.
For this reason, there is interest in improved phylodynamic inference techniques.

%% Factorization of problem into two subproblems.
%% Problems with this approach \citep{Smith2017}.

What would an ideal phylodynamic inference method look like?
First, it would afford exact computation of the phylodynamic likelihood, so that comparisons among parameterizations and models could be made on a sound basis.
Second, because nonlinearity, nonstationarity, noise, and measurement error are prominent and ubiquitous in epidemiology, it would accommodate nonlinear, time-inhomogeneous, stochastic transmission models.
Third, because many of the most scientifically important uncertainties concern heterogeneities in transmission rates and the susceptibility, behavior, age, and location of hosts, it would accommodate host populations structured by these factors.
While some structuring factors (\eg age, spatial location) are most naturally expressed in terms of continuous variables, discretely structured models have repeatedly proved their value in epidemiology.
In particular, compartmental models are extremely flexible and have often been used as approximations when continuous structure leads to uncomfortably high model dimension.
Finally, because there is typically uncertainty not only in the parameters, but also in the structure, of a host-pathogen system, an improved phylodynamic inference methodology would place minimal restrictions on the form of the models that it can accommodate.
This paper demonstrates how these desiderata can be achieved---at least for models with discrete structure---including arbitrary nonlinear compartmental models.

Of course, practical considerations play an important role as well.
In practice, data availability typically places strong limits on the degree of model complexity that can be supported by data.
In addition, computational expense typically grows with model complexity and this can also limit the utility of otherwise attractive models and inference methods.
Nevertheless, in the present paper we confine ourselves to theoretical considerations.
The results we present could form the basis for a variety of distinct algorithms the relative value of which will depend on the questions asked, models proposed, and data available, and in any case remains to be seen.
Moreover, although the theory we present is valid for models with even a countably infinite number of compartments, lack of data and computational resources will in practice require that the models that can effectively be employed may be much simpler than desired.

To connect a model at the level of a population with genealogies based on samples taken from individual hosts, it is necessary to make assumptions about the individuals in the population.
The simplest such assumption is that the individuals that are identical with respect to the population dynamics are indeed statistically identical.
That is, that they are \emph{exchangeable}.
In a compartmental model, this is tantamount to the assumption that the residence times of the individuals within each compartment are identically distributed, though not independent.
Although exchangeability is indeed an additional assumption, it is so natural that it is frequently unrecognized as such, and one often reads statements to the effect that exchangeability of individuals is a consequence of the Markovian assumption.
Nonetheless, since it adds minimal additional structure, it is the natural assumption, and the one we will make in this paper.

In the following, we take as our starting point a transmission model in the form of a discretely structured, Markov process.
We show how such a process uniquely induces each of several stochastic processes in the space of genealogies.
We go on to derive expressions for the exact likelihoods of these genealogies.

Code sufficient for the reproduction of all the results presented in this paper are freely available for download at \url{https://github.com/kingaa/structured-genealogy-process-paper}.
An archival version of these will be stored on Zotero upon publication of a peer-reviewed version of this paper.
The open-source \pkg{R} package \pkg{phylopomp} (\url{https://github.com/kingaa/phylopomp}) implements the simulation and likelihood-computation algorithms employed here.

%% In particular, while the constructions of \citet{Etheridge2019} can in principle accommodate.
%% For structured but deterministic models, approach of \citet{Volz2012,Rasmussen2014a}, based on approximation as birth-death processes:
%% approximation no longer needed.


%% Because there is commonly broad uncertainty regarding the structure of the transmission process, for example due to heterogeneities in transmission rates and susceptibilities, complex behavior patterns, etc., methods that have the plug-and-play property are particularly useful.
%% Such techniques for inference based on time series are well understood and widely used.
%% When the data lie in some Euclidean space, they can commonly be modeled as draws from some error distribution conditional on the latent state of the transmission process.
%% However, when the data are genealogies (also called phylogenies) representing the relationships of shared ancestry among genomic samples, the data-space is non-Euclidean and the appropriate models connecting the data to the latent state are non-trivial.

%% Modeling of the sampling process.

%% Extension of previous results \citep{King2022}.
%% Broader class of state-spaces.
%% Accommodating discrete structure.

%% In an \emph{unstructured} Markov population process, every lineage is exactly like every other.
%% \citet{King2022} showed how every such process induces an unstructured Markov genealogy process.
%% Here, our aim is to expand the theory considerably by allowing our population of lineages to have discrete structure.

%% Classes of Markov processes.
%% Utility and flexibility of Markov assumptions.

%% Population process induces Markov history and genealogy processes.
%% Using these, we derive equations for the likelihood of a genealogy conditional on the history.
%% We then integrate out the history to obtain nonlinear filter equations, the solution of which yields the likelihood.
%% These readily lend themselves to a family of Sequential Monte Carlo algorithms for computing the likelihood.
%% We demonstrate with several examples.

\section{Mathematical preliminaries}

\subsection{Notation}

Throughout the paper, we will adopt the convention that a bold-face symbol (\eg $\Xr$), denotes a random element.
We will be concerned with a variety of stochastic processes, in both discrete and continuous time.
In both cases, we will use a subscript to indicate the time parameter: \eg $\Xr_t$ or $\Gr_k$, where $t$ takes values in the non-negative reals $\Rp$ and $k$ in the non-negative integers $\Zp$.
In the case of continuous-time processes, we will assume that sample paths are \cadlag\ \ie right-continuous with left limits.
We will frequently need to refer to the left-limit of such a process.
Accordingly, if $\Phir_t$ is a \cadlag\ random process, we define
\begin{equation*}
  \Phirt_t\colonequals
  \begin{cases}
    \displaystyle\lim_{s\,\uparrow\,{t}}\;\Phir_{s}, & t>0,\\[2ex]
    \Phir_0, & t=0.\\
  \end{cases}
\end{equation*}
Note that $\Phirt_t$ is thus left-continuous with right limits.

If $\Phir_t$, $t\in\Rp$ is a pure jump process, knowledge of its sample path is equivalent to knowledge of the number, $\Kr_t$, of jumps it has taken as of time $t$, the jump times $\Trh_k$, and the embedded chain $\Phirh^{}_k\coloneq{\Phir_{\Trh^{}_k}}$, $k=0,\dots,\Kr^{}_t$.
In particular, if we adopt the convention that $\Trh_0=0$ and $\Trh_{\Kr_t+1}=t$, then
$\Phir_t=\Phirh_k$ for $t\in\halfopen{\Trh_k,\Trh_{k+1}}$, $k=0,\dots,\Kr_t$.

\subsection{Population process}

We are motivated by the desire for exact phylodynamic inference methods for as wide a class of epidemiological models as possible.
In particular, we would like to be able to formulate and parameterize an arbitrary compartmental model and to quantify its ability to explain data using likelihood.
\Cref{fig:example_models} depicts a few such models in order to give a sense of the kinds of complexities that can arise.
Of course, with the ability to entertain models with countably many compartments, much greater complexity is possible.
In particular, one can model not only complex infection progression, but also strain structure, behavioral structure, age structure, and spatial structure using compartmental models.
As is well known, one can discretize continuous structure-variables and employ the linear chain trick to accommodate non-exponential residence times.
While the utility of these approximations will vary, a very wide range of model assumptions lie within the scope of the theory presented here.

\begin{figure}
  \input{figs/example_models}
  \caption{
    Examples of discretely-structured population models.
    Demes are shaded.
    Compartments containing infectious hosts are outlined in green.
    Curved green lines connect transmission rates with the compartments whose occupancies control their modulation;
    each such connection gives rise to a nonlinearity in the model.
    \textbf{(A)} An SEIRS model.
    Susceptible individuals ($\lab{S}$), once infected, enter a transient incubation phase ($\lab{E}$) before they become infectious ($\lab{I}$).
    Upon recovery ($\lab{R}$), individuals experience immunity from reinfection.
    If this immunity wanes, they re-enter the susceptible compartment.
    Pathogen lineages are to be found in hosts within the $\lab{E}$ and $\lab{I}$ compartments only.
    Accordingly, there are two demes: $\Demes=\Set{\lab{E},\lab{I}}$.
    If there is exactly one lineage per host, then the occupancy, $n(\Xr_t)=(n_{\lab{E}}(\Xr_t),n_{\lab{I}}(\Xr_t))$, is the integer 2-vector giving the numbers of hosts in the respective compartments.
    See \cref{sec:demes} for definition and discussion of demes and deme occupancy.
    \textbf{(B)} In this four-deme model, two distinct pathogen strains compete for susceptibles.
    \textbf{(C)} A three-deme model according to which, after an incubation period, hosts may develop asymptomatic infection ($\lab{I_A}$).
    If they do not recover, symptomatically infected hosts ($\lab{I_S}$) can progress to hospitalization ($\lab{H}$) and death ($\lab{D}$).
    \textbf{(D)} A three-deme model with heterogeneity in transmission behavior.
    Contagious individuals move randomly between low-transmission ($\lab{I_L}$) and high-transmission ($\lab{I_H}$) behaviors.
    \label{fig:example_models}
  }
\end{figure}

We will assume that our population process is a time-inhomogeneous Markov jump process, $\Xr_t$, $t\in\Rp$, taking values in some space $\Xspace$.
In earlier work \citep{King2022}, we limited ourselves to the case $\Xspace=\mathbb{Z}^d$, but here we assume only that $\Xspace$ is a complete metric measure space with a countable dense subset.
The population process is completely specified by its initial-state density, $p_0$, and its transition rates $\alpha$.
In particular, we suppose that
\begin{equation}
  \label{eq:ic}
  \Prob{\Xr_0\in\mathcal{E}}=\int_{\mathcal{E}}{p_0(x)\,\dd{x}}
\end{equation}
for all measurable sets $\mathcal{E}\subseteq\Xspace$.
For any $t\in\Rp$, $x,x'\in\Xspace$, we think of the quantity $\alpha(t,x,x')$ as the instantaneous hazard of a jump from $x$ to $x'$.
More precisely, the transition rates have the following properties:
\begin{equation*}
  \begin{gathered}
    \alpha(t,x,x')\ge{0}, \qquad \int_{\Xspace}{\alpha(t,x,x')\,\dd{x'}}<\infty,\\
  \end{gathered}
\end{equation*}
for all $t\in\Rp$ and $x,x'\in\Xspace$ and that, as a function of time, $\alpha$ is continuous almost everywhere.
Henceforth, we understand that integrals are taken over all of $\Xspace$ unless otherwise specified.
Let $\Kr_t$ be the number of jumps that $\Xr$ has taken by time $t$.
We assume that $\Kr_t$ is a simple counting process so that
\begin{equation*}
  \begin{gathered}
    \CondProb{\Kr_{t+\Delta}=n+1}{\Kr_{t}=n}=\Delta\,\int{\alpha(t,x,x')\,\dd{x'}}+o(\Delta),\\
    \CondProb{\Kr_{t+\Delta}>n+1}{\Kr_{t}=n}=o(\Delta),\\
    \CondProb{\Xr_{t+\Delta}\in\mathcal{E}}{\Xr_{t}=x, \Kr_{t+\Delta}-\Kr_{t}=1}=\frac{\int_{\mathcal{E}}{\alpha(t,x,x')\,\dd{x'}}}{\int{\alpha(t,x,x')\,\dd{x'}}}+o(\Delta).
  \end{gathered}
\end{equation*}
We further assume that $\alpha(t,x,x')$ is \cadlag\ as a function of time for all $x,x'\in{\Xspace}$ and that the number of jumps that occur in a finite time-interval is finite, \ie $\Prob{\Kr_t<\infty}=1$ for all $t$.

\subsection{Kolmogorov forward equation}

The above may be compactly summarized by stating that if $v(t,x)$ satisfies the Kolmogorov forward equation (KFE),
\begin{equation}
  \label{eq:kfe}
  \frac{\partial{v}}{\partial{t}}(t,x)
  =\int\!{v(t,x')\,\alpha(t,x',x)\,\dd{x'}}
  -\int\!{v(t,x)\,\alpha(t,x,x')\,\dd{x'}},
\end{equation}
and if, moreover, $v(0,x) = p_0(x)$,
then $\int_{\mathcal{E}}\!{v(t,x)\,\dd{x}}=\Prob{\Xr_t\in\mathcal{E}}$ for every measurable $\mathcal{E}\subseteq{\Xspace}$.
\Cref{eq:kfe} is sometimes called the \emph{master equation} for $\Xr_t$.

\subsection{Inclusion of jumps at deterministic times}

For modeling purposes, it is sometimes desirable to insist that certain events occur at known times.
For example, if samples are collected at specific times in such a way that the timing itself conveys no information about the process, one might wish to condition on the sampling time.
We can expand the class of population models to allow for this as follows.
Suppose that $S=\Set{s_1,s_2,\dots,}\subset\Zp$ is a sequence of event times.
Let us postulate that, at each of these times, an event occurs at which $\Xr_t$ jumps according to a given probability kernel $\pi$.
In particular, for any state $x\in\Xspace$ and measurable $\mathcal{E}\subset\Xspace$,
$\pi(s_i,x,\mathcal{E})$ is the probability that the jump at time $s_i$ is to $\mathcal{E}$, conditional on the state just before the jump being $x$.
With this notation, the KFE for the process becomes
\begin{align}
  \label{eq:kfe-reg}
  \frac{\partial{v}}{\partial{t}}(t,x)
  &=\int\!{v(t,x')\,\alpha(t,x',x)\,\dd{x'}}
  -\int\!{v(t,x)\,\alpha(t,x,x')\,\dd{x'}},
  &t\notin{S},\\
  \label{eq:kfe-sing}
  v(t,x)\,\dd{x}
  &=\int\!{\leftlim{v}(t,x')\,\pi(t,x',\dd{x})\,\dd{x'}},
  &t\in{S}.
\end{align}
Note that the \cref{eq:kfe-reg} is identical to \cref{eq:kfe};
we call this the \emph{regular part} of the KFE.
We refer to \cref{eq:kfe-sing} as the \emph{singular part} of the KFE.

As a matter of notation, one can represent \cref{eq:kfe-reg,eq:kfe-sing} as a single equation in the form of \cref{eq:kfe}.
In particular, if in \cref{eq:kfe} we make the substitution
\begin{equation*}
  \alpha(t,x,x')\mapsto\alpha(t,x,x')+\sum_{s\in{S}}{\delta(t,s)\,\frac{\dd{\pi}}{\dd{x'}}(t,x,x')},
\end{equation*}
we obtain an equation which we can view as shorthand for \cref{eq:kfe-reg,eq:kfe-sing}.
Here, $\delta(t,s)$ is a Dirac delta function and $\dd{\pi}/\dd{x'}$ denotes the density (\ie Radon-Nikodym derivative) of $\pi$ with respect to the measure on $\Xspace$.
%% To be even more parsimonious, we can refer to \cref{eq:kfe-reg,eq:kfe-sing} as the $(\alpha,\pi,S)$ KFE.

\subsection{Jump marks}

%% Another perspective on the Markov processes is to be had from its Markov state transition diagram (\cref{fig:markov_state}).

\begin{figure}
  \input{figs/markov_diagram}
\end{figure}

It will be useful to divide the jumps of the population process $\Xr_t$ into distinct categories, which differ with respect to the changes they induce in a genealogy.
For this purpose, we let $\Jumps$ be a countable set of jump \emph{marks} such that
\begin{equation*}
  \alpha(t,x,x')=\sum_{u\in\Jumps}{\alpha_u(t,x,x')}.
\end{equation*}
\Cref{fig:markov_state} shows an example for which $\Jumps$ has five elements.
In the following, sums over $u$ are to be taken over the whole of $\Jumps$ unless otherwise indicated.

Let us define the \emph{jump mark} process, $\Ur_t$, to be the mark of the latest jump as of time $t$.
As usual, we take the sample paths of $\Ur_t$ to be \cadlag.
Observe that, though $\Xr_t$ and $(\Xr_t,\Ur_t)$ are Markov processes, $\Ur_t$ is not.

\subsection{Demes and deme occupancy}
\label{sec:demes}

Our first goal in this paper is to show how a given population process induces a unique stochastic process on the space of genealogies.
At each time, this genealogy will represent the relationships of shared ancestry among a population of lineages extant at that time.
To accommodate the structure of the population, this population of lineages will itself be subdivided into discrete categories.
In particular, we suppose that there are a countable set of subpopulations, within each of which individual lineages are exchangeable.
We call these subpopulations \emph{demes}, and use the symbol $\Demes$ to denote an index set for them.
\Cref{fig:example_models} illustrates this concept in the context of several compartmental models.

We define the \emph{deme occupancy} function $n:\Demes\times\Xspace\to\Zp$ so that
for $i\in\Demes$, $x\in\Xspace$, $n_i(x)$ is the number of lineages in deme $i$ when the population is in state $x$.

\subsection{Examples}

The class of population models to which the theory presented here applies is very broad indeed.
In particular, it encompasses the entire class of compartmental models with time-dependent flow rates.
Here, to give a sense of this breadth, we briefly describe a few models of interest.
\Cref{sec:app-examples} works out the theory for each of these examples.


\paragraph{SIRS model}

\citet{King2022} worked out formulas for the exact likelihood of a genealogy induced by an SIRS model.
The theory developed in this paper applies, but since there is only one deme in this model, this is a simple case.

\paragraph{SEIRS model}

A simple, yet interesting, model with more than one deme is the SEIRS model (\cref{fig:example_models}A).
The state space is $\Zp^4$, with the state $x=(S,E,I,R)$ defined by the numbers of hosts in each of the four compartments.
It has two demes: $\Demes=\Set{\lab{E},\lab{I}}$.
The deme occupancy function in this case is $n(x)=(E,I)$.
Note that the terms associated with sampling cancel each other in the KFE, since, in this model, sampling has no effect on the state.

\paragraph{Two-strain competition model}

A simple model for the competition of two strains for susceptible hosts is depicted in \cref{fig:example_models}B.
In this model, the state vector consists of seven numbers: $x=(S,E_1,E_2,I_1,I_2,R_1,R_2)$.
There are four demes ($\Demes=\Set{\lab{E}_1,\lab{E}_2,\lab{I}_1,\lab{I}_2}$) and the occupancy function is $n(x)=(E_1,E_2,I_1,I_2)$.

\paragraph{Superspreading model}

\Cref{fig:example_models}D depicts a model of superspreading.
There are three demes ($\Demes=\Set{\lab{E},\lab{I_L},\lab{I_H}}$).

\paragraph{Linear birth-death model}

The linear birth-death process, a mainstay of existing phylodynamic methods, is a special case of the theory presented here.
For this process, we have $\Xspace=\Zp$ and there is a single deme.
$\Xr_t$ represents the size of a population and $n(X_t)=X_t$.

\paragraph{Moran model and the Kingman coalescent}

The \citet{Kingman1982a} coalescent is another workhorse in existing phylodynamic approaches.
It is the ancestral process for the Moran model, in which a fixed population of $n$ lineages experiences events at times distributed according to a rate-$\mu$ Poisson process.
At each such event, an individual lineage selected uniformly at random dies and is replaced by the offspring of a second randomly selected lineage.

\subsection{History}

Consider the Markov process $(\Xr_t,\Ur_t)$.
We define its \emph{history process}, $\Hr_t$, to be the restriction of the random function $s\mapsto(\Xr_s,\Ur_s)$ to the interval $[0,t]$.
Note that $\Hr_t$ is itself trivially a Markov process, since it contains its own history.

Alternatively, one can think of $\Hr_t$ as consisting of the sequence
$\left(\left(\Trh_k,\Xrh_k,\Urh_k\right)\right)_{k=0}^{\Kr_t}$.
In particular, conditional on $\Hr_t$, both $\Xr_t$ and $\Ur_t$ are deterministic, as are $\Kr_t$, the embedded chains, $\Xrh_k$, $\Urh_k$, and the point process of event times $\Trh_k$.
The probability measure on the space of histories can be expressed in terms of these:
\begin{mathsize}{9pt}{10pt}
  \begin{equation}
    \label{eq:Hdens}
    \Prob{\dd{\H_t}}
    =p_{0}(\Xh_{0})\,\dd{\Xh_0}\,
    \prod_{k=1}^{K_{t}}{\alpha_{\Uh_k}\!\!\left(\Th_k,\Xh_{k-1},\Xh_{k}\right)\,\dd{\Xh_k}\,\dd{\Th_k}}
    \,\exp{\left(-\sum_{k=0}^{K_t}{\int_{\Th_{k}}^{\Th_{k+1}}{\sum_{u}{\int{\alpha_{u}(t',\Xh_{k},x')\,\dd{x'}}}}\,\dd{t'}}\right)},
  \end{equation}
\end{mathsize}%
where again, by convention, $\Th^{}_0=0$ and $\Th^{}_{K^{}_t+1}=t$.

If $\H$ is such a history, we define $\time(\H)$ to be the right endpoint of its domain and use the notation $\event{\H}\coloneq\Set{\Th^{}_1,\dots,\Th^{}_{\K_t}}\subset{[0,\time(\H)]}$ to denote the set of its jump times.

\subsection{Genealogies}
\label{sec:genealogy}

\begin{figure}
  \begin{center}
    <<geneal,fig.dim=c(4,2),out.width="50%">>=
    simulate(
      "SEIR",
      Beta=3,sigma=0.5,gamma=0.2,psi=0.3,omega=0.5,
      S0=15,E0=1,I0=2,R0=0,
      time=10
    ) |>
      freeze(seed=382490723) -> x

    pal <- c("#00274CFF","#FFCB05FF")

    x |> plot(points=TRUE,prune=FALSE,obscure=FALSE,palette=pal)+
      geom_vline(xintercept=10,linewidth=0.2,color="black")
    @
  \end{center}
  \caption{
    A genealogy, $G$, specifies the relationships of shared ancestry (via its tree-structure) and deme occupancy histories (via the coloring of its branches) of a set of lineages extant at some time $\time(G)$, as well as some samples gathered at earlier times.
    Here, $\time(G)=10$ and there are two demes, $\Demes=\Set{\func{blue},\func{yellow}}$.
    Tip nodes, denoting extant lineages, are shown as black dots;
    sample nodes are shown as blue dots;
    internal nodes are indicated in green.
    Note that internal nodes occur not only at branch-points, but also inline (\ie along branches).
    Wherever a lineage moves from one deme (color) to another, an internal node occurs;
    the converse does not necessarily hold.
    \label{fig:geneal}
  }
\end{figure}

A \emph{genealogy}, $G$, encapsulates the relationships of shared ancestry among a set of lineages that are extant at some time $\time(G)\in\Rp$ and perhaps a set of samples collected at earlier times (\cref{fig:geneal}).
A genealogy has a tree- or forest-like structure, with four distinct kinds of nodes:
\begin{inparaenum}[(i)]
\item \emph{tip nodes}, which represent labeled extant lineages;
\item \emph{internal nodes}, which represent events at which lineages diverged and/or moved from one deme to another;
\item \emph{sample nodes}, which represent labeled samples; and
\item \emph{root nodes}, at the base of each tree.
\end{inparaenum}
Each node $a$ is associated with a specific time, $\time(a)$.
In particular, if $a$ is a tip node in $G$, then $\time(a)=\time(G)$;
if $a$ is a sample node, then $\time(a)\le{\time(G)}$ is the time at which the sample was taken.
Moreover, if node $a$ is ancestral to node $a'$, then $\time(a)\le{\time(a')}$ and $\time(a')-\time(a)$ is the distance between $a$ and $a'$ along the genealogy.
Without loss of generality we assume that $\time(a)=0$ for all root nodes $a$.
We let $\event{G}$ denote the set of all internal and sample node-times of the genealogy $G$;
we refer to these as \emph{genealogical event times}.

Importantly, a genealogy informs us not only about the shared ancestry of any pair of lineages, but also about where in the set of demes any given lineage was at all times.
Accordingly, we can visualize a genealogy as a tree, the nodes and edges of which are painted with a distinct color for each deme (\cref{fig:geneal}).
Note that a genealogy will in general have \emph{branch-point nodes}, \ie internal nodes with more than one descendant, but may also have internal nodes with only one descendant.
We refer to such nodes as \emph{inline nodes}.
These occur whenever the color changes along a branch, but can also occur without a color-change.

Formally, we define a genealogy, $G$, to be a triple, $(T,Z,Y)$, where $T=\time(G)\in\Rp$ is the \emph{genealogy time}, $Z$ specifies the genealogy's \emph{tree structure}, and $Y$ gives the \emph{coloring}.
In particular, let $\leaves$ be a countable set of labels and let $\part(\leaves)$ be the set of all collections of finite, mutually-disjoint subsets of $\leaves$.
That is, an element $z\in{\part(\leaves)}$ is a partition of the finite set $\bigcup{z}\subseteq\leaves$.
Partition \emph{fineness} defines a partial order on $\part(\leaves)$.
Specifically, for $z,z'\in{\part(\leaves)}$, we say $z\preceq{z'}$ if and only if for every $b'\in{z'}$ there is $b\in{z}$ such that $b\supseteq{b'}$.
The tree structure of $G$ is defined by a \cadlag\ map $Z:[0,T]\to\part(\leaves)$ that is monotone in the sense that $t_1\le{t_2}$ implies $Z_{t_1}\preceq{Z_{t_2}}$.
An element $b\in{Z_t}$ is a set of labels;
it represents the branch of the tree that bears the corresponding lineages.
We use the notation $\event{Z}$ to denote the set of times at which $Z$ is discontinuous.
Note that $\event{Z}$ includes the times of all tip, sample, and branch-point nodes, but excludes inline and root nodes.
Therefore, $\event{Z}\subseteq{\event{G}}$.

The third element of $G$ specifies the coloring of branches and locations of tip, sample, and internal nodes (including inline nodes).
Mathematically, if $G=(T,Z,Y)$, then $Y$ is a \cadlag\ function that maps each point on the genealogy to a deme and a non-negative integer.
In particular, if $t\in[0,T]$ and $a$ is the label of any tip or sample node,
$Y_t(a)=(Y_t^{\lab{d}}(a),Y_t^{\lab{m}}(a))\in\Demes\times\Zp$, where $Y_t^{\lab{d}}(a)$ is the deme in which the lineage of $a$ is located at time $t$ and $Y_t^{\lab{m}}(a)$ is the number of internal or sample nodes encountered along the lineage of $a$ in going from time $0$ to time $t$.
In particular, $Y_t^{\lab{m}}(a)$ is a simple counting process, with $Y_0^{\lab{m}}(a)=0$ for all $a$.
Since $a,a'\in{b}\in{Z_t}$ implies $Y_t(a)=Y_t(a')$, one can equally well think of $Y_t$ as a map $Z_t\to\Demes\times\Zp$.
Given a tree $Z$, we let $\func{Y}(Z)$ denote the set of colorings $Y$ that are compatible with $Z$.
We moreover define $\func{Y}_t(Z)\coloneq\CondSet{Y_t}{Y\in{\func{Y}(Z)}}$.
Formally speaking, $\func{Y}(Z)$ is a fiber bundle over $Z$, each $\func{Y}_t(Z)$ being a fiber.

It will sometimes be convenient to make use of notation whereby a genealogy $G=(\time(G),G^{\lab{Z}},G^{\lab{Y}})$.

\subsection{Binomial ratio}
\label{sec:binomial_ratio}

For $n,r,\ell,s\in{\Zp^\Demes}$, define the \emph{binomial ratio}
\begin{equation*}
  \BinRatio{n}{\ell}{r}{s}\colonequals
  \begin{cases}
    \frac{\displaystyle\prod_{i\in\Demes}{\binom{n_i-\ell_i}{r_i-s_i}}}%
         {\displaystyle\prod_{i\in\Demes}{\binom{n_i}{r_i}}},
         & \text{if}\ \forall i\ n_i\ge{\Set{\ell_i,r_i}}\ge{s_i}\ge{0},\\[7ex]
         0, & \text{otherwise}.
  \end{cases}
\end{equation*}
Observe that $\BinRatio{n}{\ell}{r}{s}\in{[0,1]}$.
Moreover, in consequence of the Chu-Vandermonde identity, we have
\begin{equation*}
  \sum_{s\in\Zp^{\Demes}}\BinRatio{n}{\ell}{r}{s}\binom{\ell}{s}=1,
\end{equation*}
whenever $n_i\ge{\Set{\ell_i,r_i}}\ge{0}$ for all $i$.

\section{The induced genealogy process}

\subsection{Event types}
\label{sec:event_types}

We now show how a given population process naturally induces a process in the space of genealogies.
Specifically, at each jump in the population process, a corresponding change occurs in the genealogy, according to whether lineages branch, die, move between demes, or are sampled.
For this purpose, there are five distinct \emph{pure types} of events:
\begin{compactenum}[(a)]
\item \emph{Birth-type events} result in the branching of one or more new lineages, each from some existing lineage.
  Examples of birth-type events include transmission events, speciations, and actual births.
  Importantly, we assume that all new lineages arising from a birth event share the same parent and that  at most one birth event occurs at a time, almost surely.
\item \emph{Death-type events} result in the extinction of one or more lineages.
  Examples include recovery from infection, death of a host, and species extinctions.
  We allow for the possibility that multiple lineages die simultaneously.
\item \emph{Migration-type events} result in the movement of a lineage from one deme to another.
  Spatial movements, changes in host age or behavior, and progression of an infection can all be represented as migration-type events.
  We permit multiple lineages to move simultaneously.
\item \emph{Sample-type events} result in the collection of a sample from a lineage.
  We allow for the possibility that multiple samples are collected simultaneously, though we require that, in this case, each extant lineage is sampled at most once.
\item \emph{Neutral-type events} result in no change to any of the lineages.
\end{compactenum}
\Cref{fig:markov_state} depicts an example with jumps of all five pure types.
It is not necessary that an event be of a pure type;
\emph{compound events} partake of more than one type.
For example, a sample/death-type event, in which a lineage is simultaneously sampled and removed, has been employed \citep{Leventhal2014}, as have birth/death events in which one lineage reproduces at the same moment that another dies (\eg the \citet{Moran1958} process).
The theory presented here places few restrictions on the complexity of the events that can occur by combining events of the various pure types.

%% Because different kinds of events may differ not only in the number of offspring they engender, but also in the number of parent lineages, and the distribution of offspring among parents and demes, there is implicitly a deterministic indicator function $Q_u$, for $u\in\Jumps$, (described below) that captures these properties.

\subsection{Genealogy process}

We now show how a given population process induces a stochastic process, $\Gr_t$, on the space of genealogies.
In the case of unstructured population processes (\ie those having a single deme), \citet{King2022} gave a related construction that is equivalent to the one presented here.

\begin{figure}
  \input{figs/event_types}
  \caption{
    Event types differ by their effects on the genealogy.
    This can be seen by examining the local structure of the genealogy in the neighborhood of a jump.
    \textbf{(A)} A birth-type jump results in the branching of one or more child lineages from the parent.
    There can be only one parent, though the demes of the child lineages may differ from that of their parent.
    Here, a parent of the blue deme sires one child lineage in each of the blue and yellow demes.
    The \emph{production} of an event is an integer vector, with one entry for each deme.
    The production of this event is therefore $r=(r_{\lab{blue}},r_{\lab{yellow}})=(2,1)$.
    The \emph{deme occupancy} of an event is the number of lineages in each deme just to the right of the event.
    The deme occupancy at this event is therefore $n=(n_{\lab{blue}},n_{\lab{yellow}})=(3,5)$.
    \textbf{(B)} A death-type event causes the extinction of a lineage.
    Since internal nodes without children are recursively removed, the affected branch is dropped.
    The production of this event is $r=(0,0)$ and the deme occupancy is $n=(3,4)$.
    \textbf{(C)} A migration-type event results in the movement of one or more lineages from one deme to another.
    Here, one lineage moves from the yellow to the blue deme.
    The production of this event is $r=(1,0)$, \ie the production is 1 for the blue deme and 0 for the yellow.
    The deme occupancy is $n=(6,2)$.
    \textbf{(D)} In a sample-type event, one or more sample nodes (blue circles) are inserted.
    Here, there are two samples, one in each of the blue and yellow demes.
    Accordingly, $r=(1,1)$ and $n=(2,6)$.
    \textbf{(E)} A neutral-type event has no effect on the genealogy and zero production in all demes: $r=(0,0)$, $n=(5,3)$.
    \textbf{(F)} The theory presented here allows for compound events.
    As an example, here a birth/death-type event occurs, wherein one yellow lineage is extinguished and a blue lineage simultaneously sires a blue child.
    For this event, we have $r=(2,0)$ and $n=(6,2)$.
    \textbf{(G)} Here, a compound sample/death-type event with $r=(0,0)$ and $n=(2,5)$ occurs.
    A blue lineage is sampled and simultaneously extinguished.
    Note that recursive removal does not occur, since sample nodes are never removed.
    \textbf{(H)} A compound birth/migration-type event with $r=(4,0)$ and $n=(6,2)$.
    \label{fig:event_types}
  }
\end{figure}

At each jump in the population process, a change is made to the genealogy, according to the mark, $u$, of the jump (\cref{fig:event_types}).
In particular:
\begin{compactenum}[(a)]
\item
  If $u$ is of birth-type (\cref{fig:event_types}A), it results in the creation of one new internal node, call it $b$.
  A tip node, $a$, of the appropriate deme is chosen with uniform probability from among those present and $b$ is inserted so that its ancestor is that of $a$, while $a$ takes $b$ as its ancestor.
  One new tip node, of the appropriate deme, is created for each of the children, all of which take $b$ as their immediate ancestor.
\item
  If $u$ is of death-type (\cref{fig:event_types}B), one or more tip nodes of the appropriate demes are selected with uniform probability from among those present.
  These are deleted.
  Next, internal nodes without children are recursively removed.
  Sample nodes are never removed.
\item
  At a migration-type event (\cref{fig:event_types}C), the appropriate number of migrating lineages are selected at random with uniform probability, from among those present in the appropriate demes.
  For each selected lineage, one new branch node is inserted between the selected tip node and its ancestor.
  The color of the descendant branch changes accordingly.
\item
  At a sample-type event (\cref{fig:event_types}D), the appropriate number of sampled lineages are selected at random from among the tip nodes, with uniform probability according to deme.
  One new sample node is introduced for each selected lineage:
  each is inserted between a selected tip nodes and its ancestor.
\item
  At a neutral-type event (\cref{fig:event_types}E), no change is made to the genealogy.
\item
  Finally, events of compound type (\eg \cref{fig:event_types}F--H) are accommodated by combining the foregoing rules.
\end{compactenum}
In each of these events, the new node or nodes that are introduced have node-times equal to the time of the jump.

\subsubsection{Emergent lineages and production}
\label{sec:production}

The lineages which descend from an inserted node are said to \emph{emerge} from the event.
Thus, after a birth-type event, the emerging lineages include all the new offspring as well as the parent.
Likewise, at pure migration- or sample-type events, each migrating or sampled lineage emerges from the event.
At pure death-type events, no lineages emerge.
In general, at an event of mark $u$, there are $r^u_i$ emergent lineages in deme $i$.
We require that $r^u_i$ be a constant, for each $u$ and $i$.
Thus there is a function $r:\Jumps\times\Demes\to\Zp$, such that $r^u_i$ lineages of deme $i$ emerge from each event of mark $u$.
Since, in applications, one is free to expand the set of jump-marks $\Jumps$ as needed, this is not a restriction on the models that the theory can accommodate.
We say $r^u\coloneq{(r^u_i)_{i\in\Demes}}$ is the \emph{production} of an event of mark $u$.
Note that the lineages that die as a result of an event do not count in the production but that a parent lineage that survives the event does count.

\subsubsection{Conditional independence and exchangeability}

Application of these rules at each jump of $\Xr_t$ constructs a chain of genealogies $\Grh_k$.
In particular, at each jump-time $\Trh_k$, the genealogy $\Grh_{k-1}$ is modified according to the jump-mark $\Urh_k$ to yield $\Grh_k$.
We view $\Grh_k$ as the embedded chain of the continuous-time genealogy process $\Gr_t$.
It is very important to note that, conditional on $(\Xrh_k,\Urh_k)$, the number of parents and number of offspring in each deme is determined and the random choice of which lineages die, migrate, are sampled, or sire offspring is independent of these choices at any other times and independent of $(\Xrh_j,\Urh_j)$ for all $j\ne{k}$.
Moreover, by assumption, the lineages within each deme are exchangeable:
any lineage within a deme is as likely as any other lineage in that deme to be selected as a parent or for death, sampling, or migration.
Finally, note that $\Gr_t$ does not have the Markov property, though $(\Xr_t,\Ur_t,\Gr_t)$ and $(\Xr_t,\Gr_t)$ do.
Observe in passing that, if instead of dropping tip nodes at death events we were to retain them as we do samples, the resulting genealogy---which we might call the ``complete'' genealogy---would have the Markov property.

\subsection{Pruned and obscured genealogies}

\begin{figure}
  <<upo,fig.dim=c(4,6),out.width="50%">>=
  simulate(
    "SEIR",
    Beta=1,sigma=0.5,gamma=0.1,psi=0.4,omega=0.1,
    S0=10,E0=1,I0=1,R0=0,
    time=10
  ) |>
    freeze(seed=522390503) -> x

  pal <- c("#00274CFF","#FFCB05FF")

  plot_grid(
    A=x |>
      plot(
        points=TRUE,prune=FALSE,obscure=FALSE,
        ladderize=FALSE,palette=pal
      )+
      geom_vline(xintercept=10,linewidth=0.2,color="black"),
    B=x |>
      plot(
        points=TRUE,prune=TRUE,obscure=FALSE,
        ladderize=FALSE,palette=pal
      )+
      geom_vline(xintercept=10,linewidth=0.2,color="black"),
    C=x |>
      plot(
        points=TRUE,prune=TRUE,obscure=TRUE,
        ladderize=FALSE,palette="#B3B3B3FF"
      )+
      geom_vline(xintercept=10,linewidth=0.2,color="black"),
    ncol=1,
    align="hv",axis="tblr",
    labels="AUTO"
  )
  @
  \caption{
    Unpruned, pruned, and obscured genealogies from a single realization of the genealogy process induced by the SEIRS model depicted in \cref{fig:example_models,fig:markov_state}.
    \textbf{(A)} A realization of the unpruned genealogy process $\Gr_t$ is shown at $t=10$.
    Tip nodes, corresponding to lineages alive at time $t=10$ are indicated with black points.
    Blue points represent samples;
    green points, internal nodes.
    Branches are colored according to the deme in which the corresponding lineage resided at that point in time:
    blue denotes $\lab{E}$ and yellow, $\lab{I}$.
    \textbf{(B)} The genealogy is \emph{pruned} by deleting all tip nodes and then recursively pruning away childless internal nodes.
    Sample nodes are never removed.
    \textbf{(C)} A pruned genealogy is \emph{obscured} by effacing all deme information from lineage histories:
    the colors are erased, as are all inline nodes.
    See the text (\cref{sec:genealogy,sec:pruning,sec:obscuration}) for more detail.
    \label{fig:upo}
  }
\end{figure}

The process just described yields a genealogy that relates all extant members of the population, and all samples.
Moreover, it details each lineage's complete history of movement through the various demes.
However, the data we ultimately wish to analyze will be based only on samples.
Nor, in general, will the histories of deme occupancy be observable.
A generative model must account for this loss of information.
We therefore now describe how genealogies are \emph{pruned} to yield sample-only genealogies and then \emph{obscured} via the erasure of color from their branches (\cref{fig:upo}).

\subsubsection{Pruned genealogy}
\label{sec:pruning}

Given a genealogy $G$, one obtains the \emph{pruned genealogy}, $P=\prune(G)$ by first dropping every tip node and then recursively dropping every childless internal node (\cref{fig:upo}A--B).
In a pruned genealogy only internal and sample nodes remain, and sample nodes are found at all of the leaves and possibly some of the interior nodes of the genealogy.
Observe that a pruned genealogy is a colored genealogy:
it retains information about where among the demes each of its lineages was through time (\cref{fig:upo}B).
Note also that a pruned genealogy $P$ is characterized by its time, $\time(P)$ and the functions $P^{\lab{Y}}$ and $P^{\lab{Z}}$ just as an unpruned genealogy is.
Finally, observe that, since it contains within itself all of its past history, the pruned genealogy process $\Pr_t=\prune(\Gr_t)$ is Markov, even though the unpruned genealogy process, $\Gr_t$, is not.

\subsubsection{Lineage count and saturation}
\label{sec:ells}

In the following, we will find that we need to count the deme-specific numbers of lineages present in a given pruned genealogy at a given time.
Accordingly, suppose $P=(T,Z,Y)$ is a pruned genealogy and suppose $t\in[0,T]$.
Let $\ell_i$ denote the number of lineages in deme $i$ at time $t$ and $\ell\coloneq(\ell_i)_{i\in\Demes}\in\Zp^{\Demes}$.
Clearly, $\ell$ depends only $Y_t$.
Therefore, we can define $\ell$ as a function such that, whenever $P=(T,Z,Y)$ is a pruned genealogy, $\ell(Y_t)$ is the vector of deme-specific lineage counts at time $t$.
We refer to $\ell$ as the \emph{lineage-count} function (cf.~\cref{fig:ells}).

We will also have occasion to refer to the deme-specific number of lineages emerging from a given event.
In particular, given a node time $t$ in a pruned genealogy $P=(T,Y,Z)$, the number $s_i$ of lineages of deme $i$ emerging from all nodes with time $t$ is well defined and we can write $s\coloneq\left(s_i\right)_{i\in\Demes}$.
Like the lineage-count function, $s$ depends only on the local structure of $\P$.
However, $s$ depends not only on $Y_t$, but also on $\leftlim{Y}_t$.
Thus, we can define the \emph{saturation} function such that, whenever $P=(T,Y,Z)$ is a pruned genealogy, $s(\leftlim{Y}_t,Y_t)$ is the integer vector of deme-specific numbers of emerging lineages at time $t$.
\Cref{fig:ells} illustrates.

\begin{figure}
  \input{figs/ells}
  \caption{
    \textbf{Lineage count and saturation.}
    Each panel shows the neighborhood of a single event in the unpruned genealogy (top row) and the corresponding pruned genealogy (bottom row).
    Pruning consists of the removal of all branches that are not ancestral to some sample.
    In the bottom row of panels, pruned branches are indicated using broken lines.
    \textbf{(A)} A birth-type event with production $r=(r_{\lab{blue}},r_{\lab{yellow}})=(1,1)$ occurs.
    \textbf{(B)} Suppose that pruning results in the removal of the dashed lineages.
    Then the lineage count at this event-time is $\ell=(\ell_{\lab{blue}},\ell_{\lab{yellow}})=(2,2)$.
    The saturation is $s=(0,1)$ since only a single, yellow lineage emerges from the event.
    \textbf{(C)} A migration-type event with production $r=(0,1)$ occurs.
    \textbf{(D)} After pruning, $\ell=(2,2)$ and $s=(0,1)$.
    \textbf{(E)} A sample-type event occurs in which two blue lineages are sampled (production $r=(2,0)$).
    \textbf{(F)} After pruning, $\ell=(2,2)$ and $s=(1,0)$.
    Observe that in panels B and D, the local structures of the pruned genealogies are identical, though they arise from events of different type.
    \label{fig:ells}
  }
\end{figure}

\subsubsection{Compatibility}
\label{sec:compatibility}

Suppose $P$ is a pruned genealogy, with $\time(P)=T$ and $t\in\event{P}$.
The local structure of $P$ at $t$ is, in general, compatible with only a subset of the possible jumps $\Jumps$.
For example, if the event in $P$ at $t$ is a branch node or a sample node, then it is compatible only with birth-type or sample-type jumps, respectively.
Similarly, if the node in $P$ at time $t$ is one at which a lineage moves from deme $i$ to deme $i'$, then $u$ must be either of $i\to{i'}$ migration type or of a birth type with parent in $i$ and $r^u_{i'}>0$.
To succinctly accommodate all possibilities, let us introduce the indicator function $Q$ such that $Q=1$ if the local genealogy structure---which is captured by the values of $P^{\lab{Y}}$ just before and after $t$---is compatible with an event of type $u$ and $Q=0$ otherwise.
That is, $Q_u(y,y')=1$ if and only if
there is a feasible genealogy, $\G=(\T,\Z,\Y)$, and history, $\H$,
and a $t\in{[0,\T]}$ such that,
given $\Gr_\T=\G$ and $\Hr_\T=\H$,
we have $\U_t=u$, $\Yt_t=y$, and $\Y_t=y'$.
We refer to $Q$ as the \emph{compatibility indicator}.

\subsubsection{Obscured genealogy}
\label{sec:obscuration}

The \emph{obscured genealogy} is obtained by discarding all information about demes and events not visible from the topology of the tree alone (\cref{fig:upo}B--C).
In particular, if $P=(T,Z,Y)$ is a pruned genealogy, we write $\obs(P)=(T,Z)$ to denote the obscured genealogy.

\section{Results}

\subsection{Likelihood for pruned genealogies}

Our first result will be an expression for the likelihood of a given pruned genealogy given the history of the population process.

\begin{thm}\label{thm:pruned_lik}
  Suppose $\P=(\T,\Z,\Y)$ is a given pruned genealogy.
  Define
  \begin{equation}\label{eq:phidef}
    \phi^{}_u(x,y,y')\coloneq\BinRatio{n(x)}{\ell(y')}{r^{u}}{s(y,y')}\,Q_{u}(y,y'),
  \end{equation}
  where $n$ is the deme occupancy (\cref{sec:demes}), $r^u$ is the production (\cref{sec:production}), $\ell$ and $s$ are the lineage-count and saturation functions, respectively (\cref{sec:ells}), $Q$ is the compatibility indicator (\cref{sec:compatibility}), and the binomial ratio is as defined in \cref{sec:binomial_ratio}.
  Then
  \begin{equation*}
    \CondProb{\Pr_\T=\P}{\Hr_\T=\H}=\Indicator{\event{\H}\supseteq{\event{\P}}}\,\prod_{t\in\event{\H}}{\phi^{}_{\U_t}\!(\X_t,\Yt_t,\Y_t)}.
  \end{equation*}
\end{thm}
\begin{proof}
  If $\event{\H}\nsupseteq\event{\P}$, then $\H$ and $\P$ are incompatible and $\CondProb{\Pr_\T=\P}{\Hr_\T=\H}=0$.
  Similarly, if any event of $\H$ is incompatible with the local structure of $\P$ in the sense of \cref{sec:compatibility}, then $\CondProb{\Pr_\T=\P}{\Hr_\T=\H}=0$.
  Let us therefore suppose that neither of these conditions hold.
  Conditional on $\Hr_\T=\H$, at each time $t\in\event{\H}$, a jump of mark $\U_t$ occurred, with a production of $r^{\U_t}=(r_i)_{i\in\Demes}$, resulting in a deme-occupancy of $n(\X_t)=(n_i)_{i\in\Demes}$.
  In $\P$, at time $t$, there are $\ell_i=\ell_i(\Y_t)$ lineages in deme $i$, of which $s_i=s_i(\Yt_t,\Y_t)$ are emergent.
  By assumption, at each genealogical event, lineages within a deme are exchangeable:
  each has an identical probability of being involved.
  This exchangeability implies that each lineage present in a deme at time $t$ was equally likely to have been one of the emergent lineages.
  In particular, at time $t$, the probability that $s_i$ of the $\ell_i$ deme-$i$ lineages were among the $r_i$ of $n_i$ lineages emergent in the unpruned genealogy process is the same as the probability that, upon drawing $\ell_i$ balls without replacement from an urn containing $r_i$ red balls and $n_i-r_i$ black balls, exactly $s_i$ of the drawn balls are red, namely
  \begin{equation*}
    \frac{\binom{n_i-\ell_i}{r_i-s_i}\,\binom{\ell_i}{s_i}}{\binom{n_i}{r_i}}.
  \end{equation*}
  Because our lineages are labeled, each of the $\tbinom{\ell_i}{s_i}$ equally probable sets of $s_i$ lineages is distinct;
  just one of these is the one present in $\P$.
  Moreover, since, again conditional on $\Hr_\T=\H$, the identities of the lineages involved in a genealogical event are random and independent of the identities selected at all other events, we have established that
  \begin{equation*}
    \CondProb{\Pr_\T=\P}{\Hr_\T=\H}=\prod_{t\in\event{\H}}{\BinRatio{n(\X_t)}{\ell(\Y_t)}{r^{\U_t}}{s(\Yt_t,\Y_t)}}.
  \end{equation*}
  Returning to the possibility that $\H$ is incompatible with $\P$, since $\Prob{\Pr_\T=\P}=0$ if either any $Q^{}_{\U_t}=0$ or $\event{\P}\nsubseteq\event{\H}$, we obtain the result.
\end{proof}

Next, we show how the likelihood of a pruned genealogies, unconditional on the history, can be computed.
For this, we use the filter equation technology developed in \cref{sec:filter_eqns}.
In particular, the following theorem follows immediately from \cref{lemma:sing-filt}.

\begin{thm}
  \label{thm:pruned_uncond}
  Suppose that $\P=(\T,\Z,\Y)$ is a given pruned genealogy.
  Suppose that $w=w(t,x)$ satisfies the initial condition $w(0,x)=p_0(x)$ and the filter equation
  \begin{equation}
    \begin{aligned}
      \frac{\partial{w}}{\partial{t}}(t,x)=
      &\sum_u{\int{w(t,x')\,\alpha_u(t,x',x)\,\phi^{}_u(x,\Yt_t,\Y_t)\,\dd{x'}}}
      -\sum_u{\int{w(t,x)\,\alpha_u(t,x,x')\,\dd{x'}}},
      &t\notin{\event{\P}},\\
      w(t,x)=&\sum_u{\int{\wt(t,x')\,\alpha_u(t,x',x)\,\phi^{}_u(x,\Yt_t,\Y_t)\,\dd{x'}}},
      &t\in{\event{\P}},
    \end{aligned}
  \end{equation}
  where $\phi$ is defined in \cref{eq:phidef}.
  Then the likelihood of $\P$ is
  \begin{equation*}
    \lik(\P)=\int{w(T,x)\,\dd{x}}.
  \end{equation*}
\end{thm}

\subsection{Likelihood for obscured genealogies}

Our next result concerns the likelihood of a given obscured genealogy conditional on the history.

\begin{thm}\label{thm:obsc_lik}
  Suppose that $(\T,\Z)$ is a given obscured genealogy.
  Let $q$ and $\pi$ be probability kernels, such that
  for all $x\in\Xspace$ and $y\in\func{Y}_0(\Z)$,
  \begin{equation*}
    \begin{gathered}
      q(x,y)\ge{0},\qquad
      \sum_{y\in\func{Y}_0(\Z)}{q(x,y)}=1,
    \end{gathered}
  \end{equation*}
  and, for all $u\in\Jumps$, $t\in\Rp$, $x,x'\in\Xspace$, $y,y'\in\func{Y}_t(\Z)$,
  \begin{equation*}
    \begin{gathered}
      \pi_u(t,x,x',y,y')\ge{0},\qquad
      \sum_{y'\in\func{Y}_t(\Z)}{\pi_u(t,x,x',y,y')}=1.
    \end{gathered}
  \end{equation*}
  Suppose moreover that $\pi_u(t,x,x',y,y')>0$ whenever $\alpha_u(t,x,x')\,Q_u(y,y')>0$
  and that $q(x,y)>0$ whenever $\CondProb{\Pr_0^{\lab{Y}}=y}{\Xr_0=x}>0$.
  Then there is a stochastic jump process $\yr_t$ with sample paths in $\func{Y}(\Z)$ such that $(\Xr_t,\Ur_t,\yr_t)$ is Markov and
  \begin{equation*}
    \CondProb{\Pr^{\lab{Z}}_\T=\Z}{\Hr_\T=\H}=
    \Indicator{\event{\H}\supseteq\event{\Z}}\,\Expect{\frac{1}{q(X_0,\yr_0)}\,\prod_{t\in\event{\H}}{\frac{\phi^{}_{\U_t}(\X_t,\yrt_t,\yr_t)}{\pi_{\U_t}(t,\Xt_t,\X_t,\yrt_t,\yr_t)}}},
  \end{equation*}
  where $\phi$ is defined in \cref{eq:phidef} and
  the expectation is taken over the sample paths of $\yr_t$.
\end{thm}
\begin{proof}
  First, observe that, since $\obs$ is a deterministic operator,
  \begin{equation}\label{eq:IS1}
    \CondProb{\Pr^{\lab{Z}}_\T=\Z}{\Hr_\T=\H}=\CondExpect{\Indicator{\Pr^{\lab{Z}}_\T=\Z}}{\Hr_\T=\H}.
  \end{equation}
  Our strategy will be to evaluate \cref{eq:IS1} using importance sampling:
  we will propose pruned genealogies compatible with $\Z$ as sample paths from a stochastic process driven by $\Xr_t$ and
  evaluate the the expectation in \cref{eq:IS1} by summing over these paths.
  Conditional on $\Hr_\T=\H$, the initial distribution $q$ and probability kernel $\pi$ generate a Markov chain, $\yrh_k$ such that
  \begin{equation*}
    \begin{gathered}
      \CondProb{\yrh_0}{\Hr_\T=\H}=q(\X_0,\yrh_0),
      \qquad
      \CondProb{\yrh_k}{\yrh_{k-1},\Hr_\T=\H}=\pi_{\Uh_k}(\Th_k,\Xh_{k-1},\Xh_{k},\yrh_{k-1},\yrh_k).
    \end{gathered}
  \end{equation*}
  The required process $\yr_t$ is the unique \cadlag\ process with event times $\Th_k$ and $\yrh_k$ as its embedded chain.
  This construction of $\yr_t$ obviously guarantees that $\event{\H}\supseteq\event{\yr}\supseteq\event{\Z}$ and that $(\Xr_t,\Ur_t,\yr_t)$ is Markov.

  Now, for $\y\in\func{Y}(\Z)$, let us define $C(\y)=(\T,\Z,\y)$.
  Then, by construction, $\obs(C(\y))=(\T,\Z)$ and,
  conversely, for every pruned genealogy $\P$ satisfying $\time(\P)=\T$ and $\P^{\lab{Z}}=\Z$, $C(\P^{\lab{Y}})=\P$.
  Moreover, the conditions on the kernels $q$ and $\pi$ guarantee that, if $\CondProb{\Pr_\T=\P}{\Hr_\T=\H}>0$ and $\P^{\lab{Z}}=\Z$, then $\CondProb{\yr=\P^{\lab{Y}}}{\Hr_\T=\H}>0$.
  We therefore have that
  \begin{equation*}
    \CondProb{\Pr^{\lab{Z}}_\T=\Z}{\Hr_\T=\H}=
    \Expect{\frac{\CondProb{\Pr_\T=C(\yr)}{\Hr_\T=\H}}{\pi(\yr\vert\H)}},
  \end{equation*}
  the expectation being taken with respect to the random process $\yr$.
  Here, by definition,
  \begin{equation*}
    \pi(\yr\vert\H)=q(\X_0,\yr_0)\,\prod_{t\in\event{\H}}{\pi_{\U_{t}}(t,\Xt_{t},\X_{t},\yrt_{t},\yr_{t})}.
  \end{equation*}
  The result then follows from \cref{thm:pruned_lik}.
\end{proof}

Note that, since $\func{Y}_t(\Z)$ is finite, it is permissible, for example, to choose $q$ and $\pi$ to be uniform.

The final result shows how to compute the likelihood of an obscured genealogy.
It is an immediate consequence of \cref{thm:obsc_lik,lemma:sing-filt}.

\begin{thm}
  \label{thm:obsc_uncond}
  Let $V=(\T,\Z)$ be a given obscured genealogy.
  Then there are probability kernels $q$ and $\pi$ as in \cref{thm:obsc_lik} such that if
  \begin{equation*}
    \begin{gathered}
      \beta_u(t,x,x',y,y')=\alpha_u(t,x,x')\,\pi_u(t,x,x',y,y'),\qquad
      \varPsi_u(t,x,x',y,y')=\frac{\phi^{}_u(x',y,y')}{\pi_u(t,x,x',y,y')},
    \end{gathered}
  \end{equation*}
  and if $w=w(t,x,y)$ satisfies
  the initial condition $w(0,x,y)=p_0(x)\,\Indicator{q(x,y)>0}$
  and the filter equation
  \begin{equation*}
    \begin{aligned}
      &\frac{\partial{w}}{\partial{t}}=
      \sum_{uy'}{\int{w(t,x',y')\,\beta_u(t,x',x,y',y)\,\varPsi_u(t,x',x,y',y)\,\dd{x'}}}
      -\sum_{uy'}{\int{w(t,x,y)\,\beta_u(t,x,x',y,y')\,\dd{x'}}},
      &t\notin{\event{\Z}},\\
      &w(t,x,y)=\sum_{uy'}{\int{\wt(t,x',y')\,\beta_u(t,x',x,y',y)\,\varPsi_u(t,x',x,y',y)\,\dd{x'}}},
      &t\in{\event{\Z}},
    \end{aligned}
  \end{equation*}
  then the likelihood of $V$ is
  \begin{equation*}
    \lik(V)=\sum_y{\int{w(T,x,y)\,\dd{x}}}.
  \end{equation*}
\end{thm}

\Cref{lemma:monte_carlo} shows how this can be computed via Sequential Monte Carlo.

\section{Discussion}

The theory presented here represents a strict generalization of the existing coalescent and birth-death process approaches to phylodynamic inference.
In \cref{sec:app-examples}, we demonstrate that both of the latter processes are special cases of the genealogical processes constructed here.
Importantly, because the theory allows computation of the likelihood via strictly forward-in-time computations, it permits consideration of models for which time-reversal arguments are not available.
Moreover, inasmuch as the formulae of \cref{thm:obsc_uncond} can be efficiently computed via sequential Monte Carlo, explicit expressions for transition probabilities are not needed:
it is sufficient to be able to simulate from the population process.
This feature of the algorithms---known as the \emph{plug-and-play property} \citep{He2010}---further expands the class of population models that can be confronted with data.

In particular, the theory gives us the freedom to choose models with many demes.
For deterministic population models, \citet{Volz2012} and \citet{Rasmussen2014a} showed how one could accommodate discrete population structure.
Their procedures involve solving a large number of differential equations backward in time, relying on the time-reversibility of deterministic dynamics.
In general, this time-reversibility is not a property of stochastic processes.

Some existing methods put rather severe limits on the form of the sampling model and, as \citet{Volz2014a} pointed out, misspecification of the sampling model can lead to large inferential biases.
With the theory presented here, essentially arbitrary specification of the sampling model is possible.
In particular, one can posit sampling at a rate which is an arbitrary function of time and state and include discrete sampling events as well.
It is also possible to condition on the existence of samples.

If Sequential Monte Carlo algorithms are used to compute the likelihoods of \cref{thm:obsc_uncond}, then it is straightforward to simultaneously assimilate information from both time-series and genealogical data.
One can therefore supplement traditional incidence, disease, or mortality time series with genealogical data in an inferential exercise.

A limitation of the theory is that the population models are assumed to be pure jump processes, which allows consideration of demographic stochasticity and environmental stochasticity modeled by jumps involving multiple individuals \citep{Breto2011}, but disallows stochastic processes with a diffusive component.
It should be possible to incorporate of the full range of Markovian environmental stochasticity via extension of this theory to population models containing both diffusion and jump components.

The price of the theory's flexibility is primarily computational.
When Sequential Monte Carlo is used to evaluate the likelihood in \cref{thm:obsc_uncond}, the computational effort scales linearly with the number of samples.
In its most straightforward implementation---using an event-driven algorithm \citep[\eg][]{Gillespie1977a}---it scales nonlinearly with population size in general.
However, stochastic simulation schemes are available that scale independently of population size \citep{Higham2008}.
On the other hand, the importance sampling underlying \cref{thm:obsc_uncond} will in general require effort that is exponential in the number of demes.
For models with many demes, therefore, approaches for ameliorating or circumventing this curse of dimensionality may be necessary.
Critically, the substantial freedom one has in the choice of the importance-sampling distribution $\pi$ can be exploited for this purpose.
In particular, since it is permissible to ``borrow information'' from the future by means of the importance sampling, there is hope for highly efficient algorithmic computation.

\section*{Acknowledgments}

This work was supported by grants from
the U.S. National Institutes of Health, (Grant \#1R01AI143852 to AAK, \#1U54GM111274 to AAK and ELI)
and a grant from the Interface program, jointly operated by the U.S. National Science Foundation and the National Institutes of Health (Grant \#1761603 to ELI and AAK).
QL acknowledges the support of the Michigan Institute for Data Science.

\bibliographystyle{preprint}
\bibliography{phylopomp}

\appendix
\setcounter{equation}{0}
\setcounter{figure}{0}
\setcounter{table}{0}
%%\setcounter{tcbcounter}{0}

\titleformat{\section}[hang]{\large\bfseries}{Appendix \periodafter\thesection}{2ex}{\periodafter}{}
\renewcommand{\theequation}{\thesection\arabic{equation}}
\renewcommand{\thefigure}{\thesection\arabic{figure}}
\renewcommand{\thetable}{\thesection\arabic{table}}
%%\renewcommand{\thetcbcounter}{\thesection\arabic{tcbcounter}}

\section{Filter equations}
\label{sec:filter_eqns}

The likelihoods that appear in \cref{thm:pruned_uncond,thm:obsc_uncond} are integrals over large sets of histories.
As such, explicit expressions for them are not available, and we require mathematical tools to allow us to manipulate these quantities and devise algorithms for their numerical solution.
The \emph{filter equations} we introduce here are suitable for these purposes, and we devote this appendix to exposing their essential properties.
This extremely convenient formalism has, to our knowledge, not been thoroughly exploited, though we note their resemblance to the constructions of \citet{Ogata1978}, \citet{Puri1986}, \citet{Kliemann1990}, and \citet{Giesecke2018}.

\begin{defn}
  Let $\Xr_t$ be a continuous-time Markov process with KFE
  \begin{mathsize}{9pt}{10pt}
    \begin{equation}
      \label{eq:kfe2}
      \frac{\partial{u}}{\partial{t}}(t,x)
      =\int{u(t,x')\,\beta(t,x',x)\,\dd{x'}}
      -\int{u(t,x)\,\beta(t,x,x')\,\dd{x'}}.
    \end{equation}
  \end{mathsize}%
  Suppose that $B:\Rp\times\Xspace^2\to\Rp$ and $\lambda:\Rp\times\Xspace\to\mathbb{R}$ are are given measurable functions.
  Let $S\subset\Rp$ be countable and locally finite (\ie $S\cap{[0,t]}$ is finite for all $t>0$).
  Then the system of equations
  \begin{mathsize}{9pt}{11pt}
    \begin{align}
      \frac{\partial{w}}{\partial{t}}(t,x)&=
      \int{w(t,x')\,\beta(t,x',x)\,B(t,x',x)\,\dd{x'}}
      -\int{w(t,x)\,\beta(t,x,x')\,\dd{x'}}
      -\lambda(t,x)\,w(t,x),
      \qquad
      &t\notin{S},
      \label{eq:filter-eq-defn-reg}\\
      w(t,x)&=\int{\wt(t,x')\,\beta(t,x',x)\,B(t,x',x)\,\dd{x'}},\qquad
      &t\in{S},
      \label{eq:filter-eq-defn-sing}
    \end{align}
  \end{mathsize}%
  is called the \emph{filter equation} \emph{generated by} $\beta$, with \emph{boost} $B$, \emph{decay} $\lambda$, and \emph{observation times} $S$.
  The process $\Xr_t$ is said to be the \emph{driver} of the filter equation.
  \Cref{eq:filter-eq-defn-reg} is the \emph{regular part} of the filter equation;
  \Cref{eq:filter-eq-defn-sing} is known as the \emph{singular part}.
\end{defn}

\begin{remark}
  Trivially, a Kolmogorov forward equation is itself a filter equation with boost $1$, decay $0$, and $S=\emptyset$.
\end{remark}

The following results show how filter equations allow one to integrate over random histories.
First, \cref{lemma:reg-filt} shows how one integrates over the full space of histories using a regular filter equation.
\Cref{lemma:sing-filt} builds on this when the set of histories is restricted.

\begin{lemma}
  \label{lemma:reg-filt}
  Suppose that $B:\Rp\times\Xspace^2\to\Rp$ is measurable.
  Let $\Vr_t$ be an $\Rp$-valued random process satisfying
  \begin{equation*}
    \CondExpect{\Vr_t}{\Hr_t=\H_t}=\prod_{\mathclap{e\;\in\;\event{\H_t}}}{B(e,\Xt_e,\X_e)}.
  \end{equation*}
  Let the family of measures $\lambda_t$ on $\Xspace$ be defined by
  \begin{equation*}
    \lambda_t(\mathcal{E})=\Expect{\Vr_t\cdot\Indicator{\Xr_t\in{\mathcal{E}}}},
  \end{equation*}
  for measurable $\mathcal{E}$, and let $w(t,x)$ be the density of $\lambda_t$,
  \ie $\lambda_t(\dd{x})=w(t,x)\,\dd{x}$.
  In particular, $\Expect{\Vr_t}=\lambda_t(\Xspace)=\int{w(t,x)\,\dd{x}}$.
  Then $w$ satisfies the initial condition $w(0,x)=p_0(x)$ and the regular filter equation,
  \begin{equation}
    \label{eq:reg-filter-eq2}
    \frac{\partial{w}}{\partial{t}}=\int{w(t,x')\,\alpha(t,x',x)\,B(t,x',x)\,\dd{x'}}-\int{w(t,x)\,\alpha(t,x,x')\,\dd{x'}}.
  \end{equation}
\end{lemma}
\begin{proof}
  Since $\Prob{\Vr_0=1}=1$, $\lambda_0(\mathcal{E})=\Prob{\Xr_0\in\mathcal{E}}$, which implies that $w(0,x)=p_0(x)$.
  For $t>0$ and $\Delta>0$ sufficiently small, the expectation can be broken into three terms, according to whether $\H_t$ has zero, one, or more than one event in $\halfclosed{t-\Delta,t}$.
  Accordingly, as $\Delta\downarrow{0}$,
  \begin{equation*}
    \begin{aligned}
      w(t,x)=&\left(1-\Delta\,\int{\alpha(t-\Delta,x,x')\,\dd{x'}}\right)\,w(t-\Delta,x)\\
      &\qquad+\Delta\,\int{\alpha(t-\Delta,x',x)\,B(t-\Delta,x',x)\,w(t-\Delta,x')\,\dd{x'}}+o(\Delta).
    \end{aligned}
  \end{equation*}
  In the limit, we obtain \cref{eq:reg-filter-eq2}, the regular filter equation generated by $\alpha$, with boost $B$ and zero decay.
  %%  \AAK{[This depends on $\alpha$ and $B$ being continuous in their first arguments. We have assumed they are continuous almost everywhere.]}
\end{proof}

When events are known to have occurred at particular times, it is of interest to integrate over those histories that include an event at each of these times.
This leads to singular filter equations, as the next lemma shows.
Before we state the lemma, some terminology is needed.
Let $\mathbb{S}$ be the space of increasing, locally finite sequences in $\Rp$, with the topology induced by the Skorokhod metric and Lebesgue measure.
For $t\in\Rp$ and $s\in\mathbb{S}$, let $s^{}_t\coloneq{s\cap{[0,t]}}$.
Thus if $s\in\mathbb{S}$ and $s^{}_t=(\hat{s}_1,\dots,\hat{s}_K)$, then the infinitesimal element of Lebesgue measure at $s^{}_t$ is $\dd{s^{}_t}=\prod_{n=1}^{K}\dd{\hat{s}_n}$.

\begin{lemma}
  \label{lemma:sing-filt}
  Suppose that $B:\Rp\times\Xspace^2\to\Rp$ is measurable and
  $\Vr_t$ is an $\Rp$-valued random process satisfying
  \begin{equation*}
    \CondExpect{\Vr_t}{\Hr_t=\H_t}=\prod_{\mathclap{e\;\in\;\event{\H_{t}}}}{B(e,\Xt_e,\X_e)}.
  \end{equation*}
  Let $\lambda_t$ be a family measures on $\Xspace\times\mathbb{S}$ defined by
  \begin{equation*}
    \lambda_t(\mathcal{E},\mathcal{S})=\Expect{\Vr_t\cdot\Indicator{\Xr_t\in{\mathcal{E}}}\cdot\Indicator{\exists{s\in\mathcal{S}}\:\text{s.t.}\:\event{\Hr_t}\supseteq{s^{}_t}}},
  \end{equation*}
  whenever $\mathcal{E}\subseteq\Xspace$ and $\mathcal{S}\subseteq\mathbb{S}$ are measurable.
  Let $w(t,x,s)$ be the density of this measure, \ie
  \begin{equation*}
    \lambda_t(\dd{x}\,\dd{s})=w(t,x,s)\,\dd{x}\,\dd{s^{}_t}.
  \end{equation*}
  Then $w$ satisfies
  \begin{align}
    \label{eq:two-part-filter-reg}
    \frac{\partial{w}}{\partial{t}}(t,x,s)&=
    \int{w(t,x',s)\,\alpha(t,x',x)\,B(t,x',x)\,\dd{x'}}
    -\int{w(t,x,s)\,\alpha(t,x,x')\,\dd{x'}},\qquad
    &t\notin{s},\\
    \label{eq:two-part-filter-sing}
    w(t,x,s)&=\int{\wt(t,x',s)\,\alpha(t,x',x)\,B(t,x',x)\,\dd{x'}},\qquad
    &t\in{s}.
  \end{align}
\end{lemma}
\begin{proof}
  The proof proceeds by induction on the cardinality of $s_t$.
  The base case, for which $s_t=\emptyset$, follows immediately from \cref{lemma:reg-filt}.
  Assuming that it holds for $|s^{}_t|<K$, one has only to verify \cref{eq:two-part-filter-sing}.
  This can be accomplished by integrating \cref{eq:Hdens} directly.
\end{proof}

\begin{remark}
  In the same way that \cref{eq:kfe-reg,eq:kfe-sing} can be represented as a single equation by means of a Dirac delta notation, the \cref{eq:two-part-filter-reg,eq:two-part-filter-sing} can be collapsed into a more compact form if $\beta$ is allowed to have atoms at a countable set of time-points and the boost $B$ is adjusted appropriately.
\end{remark}

Filter equations afford a convenient means of computing expectations and likelihoods for pure jump processes.
This is facilitated by the following Lemma, the statement of which uses a one-sided Dirac delta function.
Specifically, let $\delta(v,v')$ be the right-sided Dirac delta function satisfying $\delta(v,v')=0$ for $v\ne{v'}$ and
\begin{equation*}
  \int_a^b{f(v)\,\delta(v,v')\,\dd{v}}=f(v')\,\Indicator{v'\in\halfopen{a,b}},
\end{equation*}
whenever $f$ is \cadlag\ and $-\infty\le{a}<{b}\le{\infty}$.

\begin{lemma}
  \label{lemma:monte_carlo}
  \Cref{eq:filter-eq-defn-reg,eq:filter-eq-defn-sing} are satisfied by $w(t,x)=\int_0^{\infty}{v\,u(t,x,v)\,\dd{v}}$, where $u(t,x,v)$ satisfies the KFE
  \begin{mathsize}{9pt}{10pt}
    \begin{equation}
      \begin{aligned}
        \label[pluralequation]{eq:filterlemma}
        \frac{\partial{u}}{\partial{t}}(t,x,v)=
        &\frac{\partial}{\partial{v}}\left[\lambda(t,x)\,v\,u(t,x,v)\right]
        +\int_0^{\infty}\int{u(t,x',v')\,\beta(t,x',x)\,\delta\!\left(v,B(t,x',x)\,v'\right)\,\dd{x'}\,\dd{v'}}\\
        &\quad-\int_0^{\infty}\int{u(t,x,v)\,\beta(t,x,x')\,\delta\!\left(v',B(t,x,x')\,v\right)\,\dd{x'}\,\dd{v'}},
        &t\notin{S},\\
        u(t,x,v)\,\dd{x}=
        &\int_0^{\infty}\int{\ut(t,x',v')\,\pi(t,x',\dd{x})\,\delta\!\left(v,A(t,x')\,B(t,x',x)\,v'\right)\,\dd{x'}\,\dd{v'}},&t\in{S}.
      \end{aligned}
    \end{equation}
  \end{mathsize}%
  Here, $A(t,x)\coloneq{\int{\beta(t,x,x')\,\dd{x'}}}$
  and $\pi(t,x,\dd{x'})\coloneq\beta(t,x,x')\,\dd{x'}/A(t,x)$.
\end{lemma}
\begin{proof}
  For each $t\notin{S}$, we have
  \begin{mathsize}{9pt}{10pt}
    \begin{equation*}
      \begin{aligned}
        \frac{\partial{w}}{\partial{t}}(t,x)
        =&\int_0^{\infty}{v\,\frac{\partial{u}}{\partial{t}}(t,x,v)\,\dd{v}}\\
        =&\int_0^{\infty}\int\int_0^{\infty}{v\,u(t,x',v')\,\beta(t,x',x)\,\delta\!\left(v,B(t,x',x)\,v'\right)\,\dd{v}\,\dd{x'}\,\dd{v'}}\\
        &\qquad-\int_0^{\infty}\int\int_0^{\infty}{v\,u(t,x,v)\,\beta(t,x,x')\,\delta\!\left(v',B(t,x,x')\,v\right)\,\dd{v}\,\dd{x'}\,\dd{v'}}\\
        &\qquad+\int_0^{\infty}{v\,\tfrac{\partial}{\partial{v}}\left[\lambda(t,x)\,v\,u(t,x,v)\right]\,\dd{v}}.\\
      \end{aligned}
    \end{equation*}
  \end{mathsize}%
  Here, the non-explosivity assumption guarantees that we can differentiate under the integral sign and exchange the order of integration.
  Moreover, it ensures that $u\to{0}$ as $v\to{\infty}$.
  Hence, by evaluating the first integral with respect to $v$, the second with respect to $v'$, and the third by parts, we obtain
  \begin{mathsize}{9pt}{10pt}
    \begin{equation*}
      \begin{aligned}
        \frac{\partial{w}}{\partial{t}}(t,x)
        =&\int{v'\,u(t,x',v')\,\beta(t,x',x)\,B(t,x',x)\,\dd{v'}\,\dd{x'}}
        -\int{v\,u(t,x,v)\,\beta(t,x,x')\,\dd{v}\,\dd{x'}}\\
        &\qquad-\lambda(t,x)\,\int{v\,u(t,x,v)\,\dd{v}},
      \end{aligned}
    \end{equation*}
  \end{mathsize}%
  which is simplified to obtain \cref{eq:filter-eq-defn-reg}.
  Similarly, at each $t\in{S}$, we have
  \begin{mathsize}{9pt}{10pt}
    \begin{equation*}
      \begin{aligned}
        w(t,x)\,\dd{x}
        =&\int_0^{\infty}\int\int_0^{\infty}{v\,\ut(t,x',v')\,\pi(t,x',\dd{x})\,\delta\!\left(v,A(t,x')\,B(t,x',x)\,v'\right)\,\dd{x'}\,\dd{v'}\,\dd{v}}\\
        =&\int_0^{\infty}\int{v'\,\ut(t,x',v')\,\pi(t,x',\dd{x})\,A(t,x')\,B(t,x',x)\,\dd{x'}\,\dd{v'}}\\
        =&\int{\wt(t,x')\,\beta(t,x',x)\,B(t,x',x)\,\dd{x'}}\,\dd{x}.
      \end{aligned}
    \end{equation*}
  \end{mathsize}%
  which is equivalent to \cref{eq:filter-eq-defn-sing}
\end{proof}

\begin{remark}
  \Cref{eq:filterlemma} are recognizable as the KFE of a certain process $(\Xr_t,\Vr_t)$.
  In particular, the driver $\Xr_t$ has KFE \cref{eq:kfe2}.
  $\Vr_t$ is \emph{directed} by $\Xr_t$ in the sense that $\Vr$ has jumps wherever $\Xr$ does:
  when $\Xr$ jumps at time $t$ from $x$ to $x'$, $\Vr$ jumps by the multiplicative factor $B(t,x,x')\ge{0}$.
  Between jumps, $\Vr_t$ decays deterministically and exponentially at rate $\lambda(t,x)$.
  At the known times in $S$, $\Xr$ jumps according to the probability kernel $\pi$ and, $\Vr$ jumps by the factor $A(t,x)\,B(t,x,x')$.
  If we view $\Vr_t$ as a weight, then \cref{lemma:monte_carlo} tells us how the $\Vr_t$-weighted average of $\Xr_t$ evolves in time:
  this average is simply $\int{w(t,x)\,\dd{x}}$.
  Thus, \cref{lemma:monte_carlo} shows how to integrate \cref{eq:two-part-filter-reg,eq:two-part-filter-sing} in the Monte Carlo sense.
\end{remark}

%% Filter equations allow us to pass easily between equivalent representations of a process.
%% For example, an equivalent way of representing $\Xr_t$ is in terms of its embedded chain and event times.
%% Let $\Xrh_k$ be the embedded chain of $\Xr_t$ and let $\Trh_k$ be the point process of its event times.
%% It is elementary that
%% \fontsize{10pt}{12pt}\selectfont
%% \begin{equation*}
%%   \begin{gathered}
%%     \Prob{\Xrh_k=\Xh_k\;\Big\vert\;\Xrh_{k-1}=\Xh_{k-1},\Trh_k=\Th_k}=\frac{\alpha(\Th_k,\Xh_{k-1},\Xh_k)}{\int{\alpha(\Th_k,\Xh_{k-1},x')\dd{x'}}},\\
%%     \Prob{\Trh_k>\Trh_{k-1}+t\;\Big\vert\;\Xrh_{k-1}=\Xh_{k-1},\Trh_{k-1}=\Th_{k-1}}=\exp{\left(-\int_{0}^{t}{\int{\alpha(\Th_{k-1}+s,\Xh_{k-1},x')\,\dd{x'}}\,\dd{s}}\right)}.
%%   \end{gathered}
%% \end{equation*}
%% \normalfont
%% Fixing $\nu>0$ and making the definitions,
%% \begin{equation}\label[pluralequation]{eq:nudefs}
%%   \begin{gathered}
%%     A(t,x)=\int{\alpha(t,x,x')\,\dd{x'}}, \qquad
%%     \pi(t,x,x')=\frac{\alpha(t,x,x')}{A(t,x)},\\
%%     B(t,x,x')=\frac{A(t,x)}{\nu}, \qquad
%%     \lambda(t,x)=A(t,x)-\nu,
%%   \end{gathered}
%% \end{equation}
%% we can rewrite the KFE as
%% \begin{equation}\label{eq:poissdriver}
%%   \frac{\partial{w}}{\partial{t}}(t,x)=
%%   \int{w(t,x')\,\nu\,\pi(t,x',x)\,B(t,x,x')\,\dd{x'}}
%%   -\int{w(t,x)\,\nu\,\pi(t,x,x')\,\dd{x'}}
%%   -\lambda(t,x)\,w(t,x).
%% \end{equation}
%% Here, $\nu$ is the intensity of a time-homogenous Poisson process.
%% Note that $\pi$ is the probability kernel of the embedded chain $\Xrh$ and $A(t,x)$ is the intensity of the $\Trh_k$ process.
%% We recognize this equation as the filter equation with boost $B$, decay $\lambda$, and driver generated by $\nu\,\pi(t,x,x')$.
%% It corresponds to the following procedure for simulating $\Xr_t$:
%% \begin{compactenum}[(a)]
%% \item Simulate jump times according to the rate-$\nu$ Poisson process.
%% \item Simulate the embedded chain $\Xrh_k$ using the kernel $\pi$.
%% \item Weight the realization by the product of the $B$ factors.
%%   Note that this makes the appropriate importance-sampling correction.
%% \end{compactenum}

\section{Examples}
\label{sec:app-examples}

\subsection{SIRS model} %% SIRS

\citet{King2022} worked out formulas for the exact likelihood of a genealogy induced by an SIRS model.
The theory developed in this paper applies, but since there is only one deme in this model, this is a simple case.
Its state vector is $x=(S,I,R)$ and its KFE is
\begin{mathsize}{9pt}{10pt}
  \begin{equation*}
    \begin{split}
      \frac{\partial{v}}{\partial{t}}(S,I,R)=
      &\frac{\beta\,(S+1)\,(I-1)}{N}\,v(t,S+1,I-1,R)
      -\frac{\beta\,S\,I}{N}\,v(t,S,I,R)\\
      &+\gamma\,(I+1)\,v(t,S,I+1,R-1)
      -\gamma\,I\,v(t,S,I,R)\\
      &+\omega\,(R+1)\,v(t,S-1,I,R+1)
      -\omega\,R\,v(t,S,I,R).
    \end{split}
  \end{equation*}
\end{mathsize}%
Here $N=S+I+R$ is the host population size.
Note that, though the theory allows for time-dependent event rates, this example is time-homogeneous.
This model has one deme and occupancy function $n(x)=I$.
There are four kinds of jumps:
transmission, recovery, waning of immunity, and sampling.
Accordingly, the marks are $\Jumps=\Set{\lab{Trans},\lab{Recov},\lab{Wane},\lab{Sample}}$.
\Cref{tab:sirs_model_elements} gives $\alpha_u$, $r_u$, and the event type for each of these marks.

\begin{table}[h!]
  \caption{
    \label{tab:sirs_model_elements}
    Elements of the SIRS model pertinent to the genealogy process.
    The table shows the rate ($\alpha_u$), jump ($x\mapsto{x'}$), production ($r_u$), and event type for each of the model's four marks ($u$).
  }
  %% \renewcommand{\arraystretch}{1.2}
  \begin{tabular}{ccccl}
    \hline\hline
    $u$ & $\alpha_u$ \bigstrut & $x\mapsto{x'}$ \bigstrut & $r_u$ \bigstrut & Event type \\
    \hline
    $\lab{Trans}$  & $\frac{\beta S I}{N}$ \bigstrut & $(S,I)\mapsto(S-1,I+1)$ \bigstrut & 2 & pure birth \\
    $\lab{Recov}$  & $\gamma I$ \bigstrut & $(I,R)\mapsto(I-1,R+1)$ \bigstrut & 0 & pure death \\
    $\lab{Wane}$   & $\omega R$ \bigstrut & $(S,R)\mapsto(S+1,R-1)$ \bigstrut & 0 & neutral \\
    $\lab{Sample}$ & $\psi I$ \bigstrut & $x\mapsto{x}$ \bigstrut & 1 & pure sample \\
    \hline\hline
  \end{tabular}
\end{table}

Given an obscured genealogy $Z$, let $\event{Z}=B\cup{S_0}\cup{S_1}$, where $B$ is the set of branch-times, and $S_0$, $S_1$ are the sets of sample-times with saturations $0$ and $1$, respectively.
Since there is only one deme, paintings of $Z$ can differ only in number and position of inline, internal nodes along branches.
Each of these can only correspond to $u=\lab{Trans}$ with $s=1$.
For $t\notin{\event{Z}}$, we can take the importance sampling distribution to be
\begin{equation*}
  \pi_u=\begin{cases}
  c,                & u=\lab{Trans}, s=0, t\notin{\event{Z}},\\
  \frac{1-c}{\ell}, & u=\lab{Trans}, s=1, t\notin{\event{Z}},\\
  0,                & \text{otherwise}.
  \end{cases}
\end{equation*}
Here $c$ is an arbitrary probability that does not affect the computation.
The relevant binomial ratios are
\begin{equation*}
  \BinRatio{n}{\ell}{r}{s}=
  \begin{cases}
    \frac{(I-\ell)\,(I-\ell-1)}{I\,(I-1)}, & u=\lab{Trans}, s=0, \\[1ex]
    \frac{2\,(I-\ell)}{I\,(I-1)}, & u=\lab{Trans}, s=1, \\[1ex]
    \frac{2}{I\,(I-1)}, & u=\lab{Trans}, s=2, \\[1ex]
    %% 1, & u=\lab{Recov}, I\ge{\ell}, \\[1ex]
    %% 0, & u=\lab{Recov}, I=\ell, \\[1ex]
    %% 1, & u=\lab{Wane}, \\[1ex]
    \frac{I-\ell}{I}, & u=\lab{Sample}, s=0, \\[1ex]
    \frac{1}{I}, & u=\lab{Sample}, s=1. \\[1ex]
  \end{cases}
\end{equation*}

This leads, for $t\notin{\event{Z}}$, to the following regular part of the filter equation:
\begin{mathsize}{9pt}{10pt}
  \begin{equation*}
    \begin{split}
      \frac{\partial{w}}{\partial{t}}=
      &\frac{\beta\,(S+1)\,(I-1)}{N}\,\left(1-\frac{\tbinom{\ell(t)}{2}}{\tbinom{I}{2}}\right)\,w(t,S+1,I-1,R)
      -\frac{\beta\,S\,I}{N}\,w(t,S,I,R)\\
      &+\gamma\,(I+1)\,w(t,S,I+1,R-1)
      -\gamma\,I\,w(t,S,I,R)\\
      &+\omega\,(R+1)\,w(t,S-1,I,R+1)
      -\omega\,R\,w(t,S,I,R)\\
      &-\psi\,I\,w(t,S,I,R).
    \end{split}
  \end{equation*}
\end{mathsize}%
Here, we have summed over the various paintings for $u=\lab{Trans}$, $s<2$.
Note the presence of the decay term proportional to $\psi$.
At event-times $t\in\event{Z}$, the singular part of the filter equation reads
\begin{mathsize}{9pt}{10pt}
  \begin{equation*}
    w(t,S,I,R)=
    \begin{cases}
      \wt(t,S+1,I-1,R)\,\frac{2\,\beta\,(S+1)}{N\,I}, & t\in{B}, \\[1ex]
      \wt(t,S,I,R)\,\psi\,(I-\ell(t)), & t\in{S_0}, \\[1ex]
      \wt(t,S,I,R)\,\psi, & t\in{S_1}. \\[1ex]
    \end{cases}
  \end{equation*}
\end{mathsize}%
Finally, note that $w(t,S,I,R)=0$ for all $I<\ell(t)$.

<<sirs3>>=
data.frame(
  Beta=4,gamma=2,psi=1,omega=1,
  S0=97,I0=3,R0=0,t0=0,time=40
) -> sirs_params

bake(
  file="sirs3a.rds",
  seed=328168304L,
  dependson=sirs_params,
  {
    library(phylopomp)
    sirs_params |>
      with(
        runSIRS(
          Beta=Beta,gamma=gamma,psi=psi,omega=omega,
          S0=S0,I0=I0,R0=R0,t0=t0,time=time
        )
      )
  }
) -> sirs_tree

sirs_params |>
  with(
    expand_grid(
      Beta=Beta,
      gamma=seq(1.7,2.3,by=0.02),
      psi=psi,
      omega=omega,
      S0=S0,I0=I0,R0=R0,
      t0=t0,
      rep=seq_len(8),
      Np=10000,
      )
  ) -> params

bake(
  file="sirs3b.rds",
  seed=621400057L,
  dependson=list(sirs_tree,sirs_params,params),
  {
    library(iterators)
    library(doFuture)
    plan(multicore)

    foreach (
      p=iter(params,"row")
    ) %dofuture% {
      library(pomp)
      library(phylopomp)
      p |>
        with({
          sirs_tree |>
            sirs_pomp(
              Beta=Beta,gamma=gamma,psi=psi,omega=omega,
              S0=S0,I0=I0,R0=R0,t0=0
            ) |>
            pfilter(Np=Np)
        })
    } %seed% TRUE |>
      concat()
  }) -> pfs

left_join(
  pfs |> coef() |> melt() |> pivot_wider(),
  pfs |> logLik() |> melt() |> rename(logLik=value),
  by=c(".id"="name")
) -> params

params |>
  with(
    mcap(logLik,gamma)
  ) -> mcap
@

\begin{figure}
  \begin{center}
    <<sirs3_plot,dependson="sirs1",fig.dim=c(8,2.8),out.width="100%">>=
    plot_grid(
      A=sirs_tree |>
        plot(points=FALSE,palette=c("#000000"))+
        labs(x="time"),
      B=params |>
        ggplot(aes(x=gamma,y=logLik))+
        geom_point(alpha=0.4)+
        geom_line(data=mcap$fit,aes(x=parameter,y=smoothed),color="blue")+
        geom_vline(xintercept=sirs_params$gamma,color="red")+
        geom_vline(xintercept=mcap$ci,linetype=2)+
        geom_hline(
          yintercept=max(mcap$fit$smoothed)-c(0,mcap$delta),
          linetype=2
        )+
        labs(
          color=character(0),
          y="log likelihood",
          x=expression(gamma)
        )+
        lims(y=c(max(params$logLik)-12,NA))+
        theme_classic(),
      labels="AUTO",
      nrow=1,
      rel_widths=c(1,1)
    )
    @
  \end{center}
  \caption{
    \label{fig:sirs_example}
    Likelihood computation for the SIRS model by Sequential Monte Carlo.
    (A)~A~simulated genealogy for $\beta=\Sexpr{sirs_params$Beta}$, $\gamma=\Sexpr{sirs_params$gamma}$, $\omega=\Sexpr{sirs_params$omega}$, $\psi=\Sexpr{sirs_params$psi}$, $(S_0,I_0,R_0)=(\Sexpr{with(sirs_params,c(S0,I0,R0))})$.
    (B)~A~slice through the likelihood surface at the true parameters in the $\gamma$-direction.
    Each point is a distinct Monte Carlo estimate.
    The blue curve is a LOESS smooth;
    the dashed lines bound the Monte Carlo-adjusted 95\% confidence interval \citep{Ionides2017}.
  }
\end{figure}

\subsection{SEIRS model} %% SEIRS

A simple, yet interesting, model with more than one deme is the SEIRS model (\cref{fig:example_models}A).
The state space is $\Rp^4$, with the state $x=(S,E,I,R)$ defined by the numbers of hosts in each of the four compartments.
The KFE for the population process is
\begin{mathsize}{9pt}{10pt}
  \begin{equation*}
    \begin{split}
      \frac{\partial{v}}{\partial{t}}(t,S,E,I,R)=
      &\frac{\beta\,(S+1)\,I}{N}\,v(t,S+1,E-1,I,R)
      -\frac{\beta\,S\,I}{N}\,v(t,S,E,I,R)\\
      &+\sigma\,(E+1)\,v(t,S,E+1,I-1,R)
      -\sigma\,E\,v(t,S,E,I,R)\\
      &+\gamma\,(I+1)\,v(t,S,E,I+1,R-1)
      -\gamma\,I\,v(t,S,E,I,R)\\
      &+\omega\,(R+1)\,v(t,S-1,E,I,R+1)
      -\omega\,R\,v(t,S,E,I,R),
    \end{split}
  \end{equation*}
\end{mathsize}%
where $N=S+E+I+R$ is the total population size.
Note that the terms associated with sampling cancel each other in the KFE, since, in this model, sampling has no effect on the state.

This model has two demes: $\Demes=\Set{\lab{E},\lab{I}}$.
Its deme occupancy function is $n(x)=(E,I)$.
There are five kinds of jumps:
transmission, progression, recovery, waning of immunity, and sampling.
The corresponding marks are $\Jumps=\Set{\lab{Trans},\lab{Prog},\lab{Recov},\lab{Wane},\lab{Sample}}$.
\Cref{tab:seirs_model_elements} gives $\alpha_u$, $r_u$, and the event type for each of these marks.

\input{figs/seirs_events}

\begin{table}[h!]
  \caption{
    \label{tab:seirs_model_elements}
    Elements of the SEIRS model pertinent to the genealogy process.
    The table shows the rate ($\alpha_u$), jump ($x\mapsto{x'}$), production ($r_u$), and event type for each of the model's five marks ($u$).
  }
  %% \renewcommand{\arraystretch}{1.2}
  \begin{tabular}{ccccl}
    \hline\hline
    $u$ & $\alpha_u$ & $x\mapsto{x'}$ & $r_u$ & Event type \\
    \hline
    $\lab{Trans}$  & $\frac{\beta S I}{N}$ \bigstrut & $(S,E)\mapsto(S-1,E+1)$  \bigstrut & $(1,1)$ & pure birth \\
    $\lab{Prog}$   & $\sigma E$ \bigstrut & $(E,I)\mapsto(E-1,I+1)$ \bigstrut & $(0,1)$ & pure migration \\
    $\lab{Recov}$  & $\gamma I$ \bigstrut & $(I,R)\mapsto(I-1,R+1)$ \bigstrut & $(0,0)$ & pure death \\
    $\lab{Wane}$   & $\omega R$ \bigstrut & $(S,R)\mapsto(S+1,R-1)$ \bigstrut & $(0,0)$ & neutral \\
    $\lab{Sample}$ & $\psi I$ \bigstrut & $x\mapsto{x}$ \bigstrut & $(0,1)$ & pure sample \\
    \hline\hline
  \end{tabular}
\end{table}

\begin{table}
  \caption{
    \label{tab:seirs_integ}
    Elements of a scheme for numerically computing the likelihood under the SEIRS model.
    The regular portion of the filter equation holds in between genealogical events (\ie for $t\notin{{B}\cup{S_0}\cup{S_1}}$);
    the singular portion describes the effect of these events ($t\in{{B}\cup{S_0}\cup{S_1}}$).
    Each line corresponds to a potential event, but only those for which $Q=1$ appear in the equation.
    An event with mark $u$ and saturation $s$ has the boost given by the binomial ratio shown (third column).
    The $y\mapsto{y'}$ column depicts the proposed painting schematically, and $\pi$ is the probability of that proposal.
    Blue is used for the $\lab{E}$ deme and yellow for the $\lab{I}$ deme.
    For each line, the filter equation contains $m$ terms.
    An asterisk ($\ast$) stands for cases not explicitly mentioned.
  }
  %% \renewcommand{\arraystretch}{1.2}
  \begin{tabular}{c c c c c c c c | c}
    \hline\hline
    & $u$ & $s$ & $Q$ & $\BinRatio{n}{\ell}{r}{s}$ & $y\mapsto{y'}$ & $\pi$ & $m$ & Line\\[2ex]
    \hline
    \multirow[c]{10}{*}[-1ex]{\rotatebox{90}{$t\notin{{B}\cup{S_0}\cup{S_1}}$}}
    & \multirow{4}{*}{$\lab{Trans}$}
    & $(0,0)$ & 1 & $\left(\frac{E-\ell_E}{E}\right)\left(\frac{I-\ell_I}{I}\right)$ \bigstrut & \unbroken \bigstrut & $\frac{I-\ell_I}{I}$ & 1 & 1 \\
    \cline{3-8}
    && $(1,0)$ & 1 & $\frac{I-\ell_I}{E I}$ \bigstrut & \maizeblue \bigstrut & $\frac{1}{2I}$ & $\ell_I$ & 2 \\
    \cline{3-8}
    && $(0,1)$ & 1 & $\frac{E-\ell_E}{E I}$ \bigstrut & \maizemaize \bigstrut & $\frac{1}{2I}$ & $\ell_I$ & 3 \\
    \cline{3-8}
    && $(1,1)$ & 0 & & & & 0 & 4 \\
    \cline{2-8}
    &\multirow{2}{*}{$\lab{Prog}$}
    & $(0,0)$ & 1 & $\frac{I-\ell_I}{I}\,\Indicator{E\ge\ell_E}$ \bigstrut & \unbroken \bigstrut & $\frac{E-\ell_E}{E}\,\Indicator{E>\ell_E}$ \bigstrut & 1 & 5 \\
    \cline{3-8}
    && $(0,1)$ & 1 & $\frac{1}{I}\,\Indicator{E\ge\ell_E}$ \bigstrut & \bluemaize \bigstrut & $\frac{1}{E}\,\Indicator{E>\ell_E}$ \bigstrut & $\ell_E$ & 6 \\
    \cline{2-8}
    & $\lab{Recov}$ & $(0,0)$ & 1 & $\Indicator{I\ge\ell_I}$ \bigstrut & \unbroken \bigstrut & $\Indicator{I>\ell_I}$ \bigstrut & 1 & 7 \\
    \cline{2-8}
    & $\lab{Wane}$ & $(0,0)$ & 1 & 1 \bigstrut & \unbroken \bigstrut & 1 & 1 & 8 \\
    \cline{2-8}
    & $\lab{Sample}$ & $\ast$ & 0 &  &  & & 0 & 9 \\
    \cline{1-8}
    \multirow[c]{4}{*}[-3ex]{\rotatebox{90}{$t\in{B}$}}
    & \multirow{4}{*}{$\lab{Trans}$}
    & \multirow{4}{*}{$(1,1)$}
    & 1 & $\frac{1}{E I}$ \bigstrut & \branchup \bigstrut & $\frac{1}{2}$ \bigstrut & 1 & 10 \\
    \cline{4-8}
    &&& 1 & $\frac{1}{E I}$ \bigstrut & \branchdown \bigstrut & $\frac{1}{2}$ \bigstrut & 1 & 11 \\
    \cline{4-8}
    &&& 0 &  \bigstrut & \branchupnix \bigstrut & & 0 & 12 \\
    \cline{4-8}
    &&& 0 &  \bigstrut & \branchdownnix \bigstrut & & 0 & 13 \\
    \cline{2-8}
    & $\ast$ & $\ast$ & 0 &  \bigstrut &  & & 0 & 14 \\
    \cline{1-8}
    \multirow[c]{2}{*}[-2ex]{\rotatebox{90}{$t\in{S_0}$} \bigstrut}
    & \multirow{2}{*}{$\lab{Sample}$}
    & \multirow{2}{*}{$(0,0)$}
    & 1 & $\frac{I-\ell_I}{I}$ \bigstrut & \samplezero & 1 & 1 & 15 \\
    \cline{4-8}
    &&& 0 &  & \samplezeronix \bigstrut & & 0 & 16 \\
    \cline{2-8}
    & $\ast$ & $\ast$ & 0 &  &  & & 0 & 17 \\
    \cline{1-8}
    \multirow[c]{2}{*}[-2ex]{\rotatebox{90}{$t\in{S_1}$}}
    & \multirow{2}{*}{$\lab{Sample}$}
    & \multirow{2}{*}{$(0,1)$}
    & 1 & $\frac{1}{I}$ \bigstrut & \sampleone \bigstrut & 1 & 1 & 18 \\
    \cline{4-8}
    &&& 0 &  & \sampleonenix \bigstrut & & 0 & 19 \\
    \cline{2-8}
    & $\ast$ & $\ast$ & 0 &  &  & & 0 & 20 \\
    \hline\hline
  \end{tabular}
\end{table}

The filter equation corresponding to the scheme of \cref{tab:seirs_integ} is presented in \Cref{box:SEIRS}.
Some numerical results are presented in \cref{fig:seirs_example}.

\clearpage
\newpage

\begin{bluebox}[label={box:SEIRS}]{Filter equation for the SEIRS model}
  As previously, given an obscured genealogy, let $B$ be the set of its branch times, $S_0$ be the set of tip-sample times, and $S_1$ be the set of inline sample times.
  Then for $t\notin{{B}\cup{S_0}\cup{S_1}}$, the filter equation reads:
  \begin{mathsize}{9pt}{10pt}
    \begin{equation}
      \label{eq:seirs_filter_reg}
      \begin{split}
        \frac{\partial{w}}{\partial{t}}(t,S,&E,I,R,y)=
        \frac{\beta\,(S+1)\,I}{N}\,\left(1-\frac{\ell_E}{E}\right)\,\left(1-\frac{\ell_I}{I}\right)\,w(t,S+1,E-1,I,R,y)\\
        &+\sum_{k=1}^{\ell_I}\frac{\beta\,(S+1)\,I}{N}\,\,\frac{1}{E}\,\left(1-\frac{\ell_I}{I}\right)\,w(t,S+1,E-1,I,R,\leftlim{y}_2^k)\\
        &+\sum_{k=1}^{\ell_I}\frac{\beta\,(S+1)\,I}{N}\,\frac{1}{I}\,\left(1-\frac{\ell_E}{E}\right)\,w(t,S+1,E-1,I,R,\leftlim{y}_3^k)
        -\frac{\beta\,S\,I}{N}\,w(t,S,E,I,R,y)\\
        &+\sigma\,(E+1)\,\Indicator{E\ge{\ell_E}}\,\left(1-\frac{\ell_I}{I}\right)\,w(t,S,E+1,I-1,R,y)\\
        &+\sum_{k=1}^{\ell_E}\sigma\,(E+1)\,\Indicator{E\ge{\ell_E}}\,\frac{1}{I}\,w(t,S,E+1,I-1,R,\leftlim{y}_6^k)
        -\sigma\,E\,w(t,S,E,I,R,y)\\
        &+\gamma\,(I+1)\,\Indicator{I\ge\ell_I}\,w(t,S,E,I+1,R-1,y)
        -\gamma\,I\,w(t,S,E,I,R,y)\\
        &+\omega\,(R+1)\,w(t,S-1,E,I,R+1,y)
        -\omega\,R\,w(t,S,E,I,R,y)
        -\psi\,I\,w(t,S,E,I,R,y).
      \end{split}
    \end{equation}
  \end{mathsize}%
  Here, $\leftlim{y}_j^k$ refers to the coloring of the tree immediately preceding the proposal indicated on line $j$ of \cref{tab:seirs_integ}.
  The integer $k$ specifies the particular branch on which the change occurs.

  The singular portion of the filter equation has one component for each distinct type of genealogical event:
  \begin{mathsize}{9pt}{10pt}
    \begin{equation}
      \label{eq:seirs_filter_sing}
      w(t,S,E,I,R,y)=
      \begin{cases}
        \frac{\beta\,(S+1)\,I}{N}\,\frac{1}{E I}\,\wt(t,S+1,E-1,I,R,\leftlim{y}_{10})\\[2ex]
        \qquad+\frac{\beta\,(S+1)\,I}{N}\,\frac{1}{E I}\,\wt(t,S+1,E-1,I,R,\leftlim{y}_{11}), & t\in{B},\\[2ex]
        \psi\,\left(I-\ell_I\right)\,\wt(t,S,E,I,R,\leftlim{y}_{15}), & t\in{S_0},\\[2ex]
        \psi\,\wt(t,S,E,I,R,\leftlim{y}_{18}), & t\in{S_1}.
      \end{cases}
    \end{equation}
  \end{mathsize}%

  In addition to \cref{eq:seirs_filter_reg,eq:seirs_filter_sing}, the quantity $w$ should satisfy the condition $w(t,S,E,I,R,y)=0$ whenever $E<\ell_E$ or $I<\ell_I$.

  A variety of importance-sampling kernels are permissible under the terms of \cref{thm:obsc_uncond}.
  With a particular choice of importance-sampling kernel, the filter equation uniquely specifies a Sequential Monte Carlo algorithm for estimating the likelihood.
  The choices made in \cref{tab:seirs_integ} underlie the results displayed in \cref{fig:seirs_example}.
\end{bluebox}

<<seirs3>>=
seirs_params <- data.frame(
  Beta=3,sigma=1,gamma=0.5,psi=0.02,omega=0.08,
  S0=70,E0=1,I0=0,R0=50,
  time=400
)

bake(
  file="seirs3a.rds",
  dependson=seirs_params,
  seed=509673338,
  seirs_params |>
    with(
      runSEIR(
        Beta=Beta,sigma=sigma,gamma=gamma,psi=psi,omega=omega,
        S0=S0,E0=E0,I0=I0,R0=R0,
        time=time
      )
    )
)-> seirs_tree

seirs_params |>
  select(-sigma,-time) |>
  expand_grid(
    sigma=seq(0.4,2.4,length.out=25),
    rep=seq_len(8)
  ) |>
  mutate(N=S0+E0+I0+R0) |>
  collect() -> params

bake(
  file="seirs3b.rds",
  dependson=list(params,seirs_params,seirs_tree),
  seed=751601556,
  {
    seirs_params |>
      with(
        seirs_tree |>
          seirs_pomp(
            Beta=Beta,sigma=sigma,gamma=gamma,psi=psi,omega=omega,
            S0=S0,E0=E0,I0=I0,R0=R0
          )
      ) -> po

    library(iterators)
    library(doFuture)
    plan(multicore)
    foreach (
      p=iter(params,"row")
    ) %dofuture% {
      library(phylopomp)
      po |>
        pfilter(params=p,Np=1e4)
    } %seed% TRUE |>
      concat()
  }
) -> pfs

left_join(
  pfs |> coef() |> melt() |> pivot_wider(),
  pfs |> logLik() |> melt() |> rename(logLik=value),
  by=c(".id"="name")
) -> params

params |>
  with(
    mcap(logLik,sigma,span=0.5)
  ) -> mcap
@

\begin{figure}
  \begin{center}
    <<seirs3_plot,fig.dim=c(8,2.8),out.width="100%",dependon="seirs3">>=
    plot_grid(
      A=seirs_tree |>
        plot(points=FALSE,palette="#000000")+
        labs(x="time"),
      B=params |>
        ggplot()+
        geom_point(aes(x=sigma,y=logLik))+
        geom_line(data=mcap$fit,aes(x=parameter,y=smoothed),color="blue")+
        geom_vline(xintercept=seirs_params$sigma,color="red")+
        geom_vline(xintercept=mcap$ci,linetype=2)+
        geom_hline(
          yintercept=with(mcap,max(fit$smoothed)-c(0,delta)),
          linetype=2
        )+
        lims(y=c(max(params$logLik)-12,NA))+
        labs(
          color=character(0),
          y="log likelihood",
          x=expression(sigma)
        )+
        theme_classic(),
      labels="AUTO",
      nrow=1,
      rel_widths=c(3,4)
    )
    @
  \end{center}
  \caption{
    \label{fig:seirs_example}
    Likelihood computation for the SEIRS model by Sequential Monte Carlo, using the scheme of \cref{box:SEIRS}.
    (A)~Simulated genealogy for $\beta=\Sexpr{seirs_params$Beta}$,
    $\sigma=\Sexpr{seirs_params$sigma}$,
    $\gamma=\Sexpr{seirs_params$gamma}$,
    $\psi=\Sexpr{seirs_params$psi}$,
    $\omega=\Sexpr{seirs_params$omega}$,
    $(S_0,E_0,I_0,R_0)=(\Sexpr{with(seirs_params,c(S0,E0,I0,R0))})$.
    (B)~Likelihood slice in the $\sigma$-direction.
    Each point represents the estimate of an independent Sequential Monte Carlo computation.
    The blue curve shows a LOESS smooth;
    the dashed vertical lines enclose the Monte Carlo-adjusted 95\% confidence interval \citep{Ionides2017}.
  }
\end{figure}

\subsection{Two-strain competition model}

A simple model for the competition of two strains for susceptible hosts is depicted in \cref{fig:example_models}B.
This example will be included in a forthcoming draft.

\subsection{Superspreading model}

\Cref{fig:example_models}D depicts a model of superspreading.
This example will be included in a forthcoming draft.


\subsection{Linear birth-death model} %% LBDP

In this model, the state variable is the size, $N_t$, of a population at time $t$.
All individuals face the same per-capita birth and death rates, which are $\lambda$ and $\mu$, respectively.
The KFE is
\begin{mathsize}{9pt}{10pt}
  \begin{equation*}
    \frac{\partial{v}}{\partial{t}}=
    \lambda\,(n-1)\,v(t,n-1)
    -\lambda\,n\,v(t,n)
    +\mu\,(n+1)\,v(t,n+1)
    -\mu\,n\,v(t,n)
  \end{equation*}
\end{mathsize}%
\citet{Stadler2010} considered the case where samples are taken through time at a uniform per-capita rate $\psi$.
In this case, since there is only one deme, in the filter equation, $w$ can be taken to be independent of $y$.
If $B$ is the set of branch-times and $S_0$, $S_1$ are the sets of terminal and inline samples, respectively, then the regular part of the filter equation is
\begin{mathsize}{9pt}{10pt}
  \begin{equation}
    \label{eq:filter-lbdp-reg}
    \begin{aligned}
      \frac{\partial{w}}{\partial{t}}(t,n)=
      &\lambda\,(n-1)\,\left(1-\frac{\tbinom{\ell(t)}{2}}{\tbinom{n}{2}}\right)\,w(t,n-1)
      -\lambda\,n\,w(t,n)
      \\
      &\qquad+\mu\,(n+1)\,w(t,n+1)-\mu\,n\,w(t,n)-\psi\,n\,w(t,n),
      &&n\ge\ell(t),\quad t\notin{B\cup{S_0}\cup{S_1}}
    \end{aligned}
  \end{equation}
\end{mathsize}%
and the singular part is
\begin{mathsize}{9pt}{10pt}
  \begin{equation}
    \label{eq:filter-lbdp-sing}
    \begin{gathered}
      w(t,n)=
      \frac{\lambda\,(n-1)}{\tbinom{n}{2}}\,\wt(t,n-1),
      \quad t\in{B},
      \\
      w(t,n)=
      \psi\,n\,\left(1-\frac{\ell(t)}{n}\right)\,\wt(t,n),
      \quad t\in{S_0},
      \qquad
      w(t,n)=
      \psi\,\wt(t,n),
      \quad t\in{S_1}.
    \end{gathered}
  \end{equation}
\end{mathsize}%
\Cref{eq:filter-lbdp-reg,eq:filter-lbdp-sing} are supplemented by the ancillary condition $w(t,n)=0$ for $n<\ell(t)$.

\begin{table}[h!]
  \caption{
    \label{tab:lbdp_model_elements}
    Elements of the linear birth-death-sampling model pertinent to the genealogy process.
  }
  \renewcommand{\arraystretch}{1.2}
  \begin{tabular}{cccl}
    \hline\hline
    $u$ & $\alpha_u$ & $r_u$ & Event type \\
    \hline
    $\lab{Birth}$  & $\lambda N$ & 2 & pure birth \\
    $\lab{Death}$  & $\gamma N$ & 0 & pure death \\
    $\lab{Sample}$ & $\psi N$ & 1 & pure sample \\
    \hline\hline
  \end{tabular}
\end{table}

<<lbdp3>>=
lbdp.params <- data.frame(
  lambda=1.2,mu=0.8,psi=1,n0=5,
  time=10
)

bake(
  file="lbdp3a.rds",
  seed=915645370,
  dependson=lbdp.params,
  lbdp.params |>
    with(
      runLBDP(lambda=lambda,mu=mu,psi=psi,n0=n0,time=time)
    )
) -> lbdp_tree

bake(
  file="lbdp3b.rds",
  seed=712604404,
  dependson=list(lbdp_tree,lbdp.params),
  {
    lbdp.params |>
      with(
        expand_grid(
          rep=1:10,
          lambda=lambda,
          mu=seq(0.5,1.1,by=0.05),
          psi=psi,
          n0=n0,
          Np=1000*2^seq(0,7)
        )
      ) -> params

    library(iterators)
    library(doFuture)
    plan(multicore)

    foreach (
      p=iter(params,"row"),
      .combine=bind_rows,
      .options.future=list(seed=TRUE)
    ) %dofuture% {
      p |>
        with(
          lbdp_tree |>
            lbdp_exact(lambda=lambda,mu=mu,psi=psi,n0=n0)
        ) -> ll1
      p |>
        with(
          lbdp_tree |>
            lbdp_pomp(lambda=lambda,mu=mu,psi=psi,n0=n0) |>
            pfilter(Np=Np) |>
            logLik()
        ) -> ll2
      bind_cols(p,exact=ll1,pf=ll2)
    }-> params
  }
) -> params

params |>
  mutate(
    diff=pf-exact
  ) |>
  group_by(Np) |>
  summarize(
    rmse=sqrt(mean(diff*diff)),
    bias=abs(mean(diff)),
    prec=sqrt(rmse^2-bias^2)
  ) |>
  ungroup() -> stats
@

\begin{figure}
  \begin{center}
    <<lbdp3_plot,fig.dim=c(9,7),out.width="100%">>=
    pal <- c(viridis_pal(option="H",begin=0.1,end=0.8)(8),"#000000")
    names(pal) <- c("1k","2k","4k","8k","16k","32k","64k","128k","exact")

    plot_grid(
      ncol=1,
      rel_heights=c(3,2),
      AB=plot_grid(
        labels=c("A","B"),
        nrow=1,
        rel_widths=c(3,5),
        A=lbdp_tree |>
          plot(points=FALSE,palette="#000000")+
          labs(x="time"),
        B=params |>
          pivot_longer(c(exact,pf)) |>
          unite(name,name,Np) |>
          mutate(
            name=if_else(grepl("exact",name),"exact",name),
            name=gsub("pf_","",name),
            name=gsub("000","k",name),
            name=ordered(name,levels=names(pal))
          ) |>
          group_by(lambda,mu,psi,n0,name) |>
          reframe(
            type=c("logLik","logLik_se"),
            value=logmeanexp(value,se=TRUE)
          ) |>
          ungroup() |>
          pivot_wider(names_from=type) |>
          mutate(
            y=logLik,
            ymax=logLik+2*logLik_se,
            ymin=logLik-2*logLik_se
          ) |>
          filter(logLik>max(logLik)-16) |>
          ggplot(
            aes(
              x=mu,group=name,color=name,
              y=y,ymin=ymin,ymax=ymax
            )
          )+
          geom_errorbar(
            position="dodge"
          )+
          geom_vline(xintercept=lbdp.params$mu,color="red")+
          geom_hline(
            yintercept=max(params$exact)-
              c(0,0.5*qchisq(p=0.95,df=1)),
            linetype=2
          )+
          scale_color_manual(values=pal)+
          labs(
            color="effort",
            y="log likelihood",
            x=expression(mu)
          )
      ),
      CDE=plot_grid(
        labels=c("C","D","E"),
        nrow=1,
        rel_widths=c(24,24,17),
        C=stats |>
          ggplot(aes(x=Np,y=rmse))+
          geom_smooth(formula=y~x,method="lm")+
          geom_point()+
          scale_x_log10(labels=\(x)aakmisc::scinot(x,simplify=TRUE))+
          scale_y_log10()+
          coord_fixed(ratio=1)+
          labs(x="effort",y="RMSE"),
        D=stats |>
          ggplot(aes(x=Np,y=prec))+
          geom_smooth(formula=y~x,method="lm")+
          geom_point()+
          scale_x_log10(labels=\(x)aakmisc::scinot(x,simplify=TRUE))+
          scale_y_log10()+
          coord_fixed(ratio=1)+
          labs(x="effort",y="SD"),
        E=stats |>
          ggplot(aes(x=Np,y=bias))+
          geom_smooth(formula=y~x,method="lm")+
          geom_point()+
          scale_x_log10(labels=\(x)aakmisc::scinot(x,simplify=TRUE))+
          scale_y_log10(labels=\(x)aakmisc::scinot(x,simplify=TRUE))+
          coord_fixed(ratio=1)+
          labs(x="effort",y=expression(group("|",bias,"|")))
      )
    )
    @
  \end{center}
  \caption{
    \label{fig:lbdp_example}
    Likelihood computation for the constant-parameter, linear birth-death-sampling model, according to \cref{thm:obsc_uncond} via Sequential Monte Carlo.
    Panel (A) shows the genealogy, simulated for $\lambda=\Sexpr{lbdp.params$lambda}$, $\mu=\Sexpr{lbdp.params$mu}$, $\psi=\Sexpr{lbdp.params$psi}$, $N_0=\Sexpr{lbdp.params$n0}$.
    Panel (B) shows a likelihood slice, through the true parameters in the $\mu$ direction.
    As computational effort (\ie number of particles) increases, the Monte Carlo estimates converge on the exact values, for which an explicit formula exists in this case.
    The dashed horizontal lines show the approximate maximized likelihood and the 95\% critical value (under the likelihood-ratio test).
    Panels (C--E) are log-log plots that show how the root-mean-square error (RMSE), imprecision (SD), and bias decrease with effort.
    Note that the bias and the SD are roughly inversely proportional to the effort and its square-root, respectively, as expected.
  }
\end{figure}

\subsection{Moran model and the Kingman coalescent}

In the Moran model, events occur according to a rate-$\mu$ Poisson process.
At each event, a compound birth-death jump (cf.~\cref{fig:event_types}F) occurs so that the population size, $n$, remains constant.
If we let $X_t$ be the number of events that have occurred by time $t$, then $X_t$ is a simple counting process, which we can use to define the state of the population process.
Its KFE is then
\begin{mathsize}{9pt}{10pt}
  \begin{equation*}
    \begin{gathered}
      \frac{\partial{v}}{\partial{t}}=
      \mu\,(x-1)\,v(t,x-1)
      -\mu\,x\,v(t,x),
      \qquad
      v(0,x) =
      \begin{cases}
        1, & x=0,\\
        0, & x>0.
      \end{cases}
    \end{gathered}
  \end{equation*}
\end{mathsize}%

Since there is only a single deme, and since nothing depends on the state, in writing the corresponding filter equation, we can take $w$ to be independent of both $x$ and $y$.

In the classical case \citep{Kingman1982a}, $m$ samples are taken simultaneously at a single time, $T$.
Then, if $B$ is the set of branch-times and $\ell(t)$ is the number of lineages in the genealogy at time $t$, the filter equation reads
\begin{mathsize}{9pt}{10pt}
  \begin{equation}
    \label[pluralequation]{eq:moran-filter}
    \begin{gathered}
      w(0) = 1,
      \qquad
      \frac{\partial{w}}{\partial{t}}=
      \mu\,w(t)\,\left(1-\frac{\tbinom{\ell(t)}{2}}{\tbinom{n}{2}}\right)
      -\mu\,w(t),\quad t\notin{B},
      \qquad
      w(t) = \frac{\mu}{\tbinom{n}{2}}\,\wt(t),\quad t\in{B}.
    \end{gathered}
  \end{equation}
\end{mathsize}%
Integrating \cref{eq:moran-filter} and taking logarithms yields
\begin{equation}
  \label{eq:kingman}
  \log{w(T)}=k\,\log{\frac{\mu}{\tbinom{n}{2}}}-\frac{\mu}{\tbinom{n}{2}}\,\sum_{i=m-k}^{m}\!{\tbinom{i}{2}\,s^{}_i},
\end{equation}
where $k=|B|$ is the number of branch-points in $[0,T]$ and the $s^{}_{i}\coloneq\int{\Indicator{\ell(t)=i}\,\dd{t}}$ are the durations of the \emph{coalescent intervals}, \ie intervals between successive branch-points.
We recognize \cref{eq:kingman} as the expression for the \citet{Kingman1982a} coalescent \citep[\eg][]{Wakeley2009}.

More generally, if in addition samples are taken according to a rate-$\nu$ Poisson process such that the set of sample-times in the genealogy is $S=S_0\cup{S_1}$, where $S_0$, $S_1$ are the sets of times of terminal and inline samples, respectively, then the filter equation reads
\begin{mathsize}{9pt}{10pt}
  \begin{equation}
    \label[pluralequation]{eq:moran-filter2}
    \begin{gathered}
      w(0) = 1,
      \qquad
      \frac{\partial{w}}{\partial{t}}=
      -\mu\,\frac{\tbinom{\ell(t)}{2}}{\tbinom{n}{2}}\,w(t),
      \quad t\notin{S\cup{B}},
      \qquad
      w(t) = \frac{\mu}{\tbinom{n}{2}}\,\wt(t),\quad t\in{B},\\[8pt]
      w(t) = \nu\,\left(1-\frac{\ell(t)}{n}\right)\,\wt(t),\quad t\in{S_0},
      \qquad
      w(t) = \frac{\nu}{n}\,\wt(t),\quad t\in{S_1}.
    \end{gathered}
  \end{equation}
\end{mathsize}%
Integrating \cref{eq:moran-filter2} yields
\begin{equation}
  \label{eq:mgp}
  \log{w(T)}-|S|\,\log{\nu}
  =\sum_{t\in{S_0}}{\log{\left(1-\frac{\ell(t)}{n}\right)}}
  -|S_1|\,\log{n}
  +|B|\,\log{\frac{\mu}{\tbinom{n}{2}}}
  -\frac{\mu}{\tbinom{n}{2}}\,\sum_{i=1}^{\infty}\!{\tbinom{i}{2}\,s^{}_i}.
\end{equation}

\end{document}

<<sessioninfo,include=FALSE,purl=TRUE>>=
sessionInfo()
@