This package provides an efficient maximum likelihood estimator to extract diffusion coefficients from single-molecule tracking experiments subject to static and dynamic noise. It can either be used to find a single diffusion coefficient that best describes the underlying dynamics of a trajectory sample, or to analyze each trajectory separately.
In case of heterogeneous data, i.e., data originating from subpopulations with differing diffusion coefficients, we provide an expectation-maximization algorithm that assigns the trajectories to subpopulations in a probabilistic manner. The optimal number of subpopulations is determined with the help of a quality factor of known analytical distribution.
For more details on the theoretical framework, please refer to the associated open access publication:
J. T. Bullerjahn and G. Hummer, "Maximum likelihood estimates of diffusion coefficients from single-particle tracking experiments", Journal of Chemical Physics 154, 234105 (2021). https://doi.org/10.1063/5.0038174
Please cite this reference if you use DiffusionMLE
to analyze your data.
Currently, the package is not in a registry. It must therefore be added by specifying a URL to the repository:
using Pkg; Pkg.add(url="https://github.com/bio-phys/DiffusionMLE")
Users of older software versions may need to wrap the contents of the brackets with PackageSpec()
.
Each trajectory can be seen as a d-dimensional array (Array{Float64,2}
), so the data set should be of the type Array{Array{Float64,2},1}
.
Your data can, e.g., be read in as follows:
using DelimitedFiles
file_names = readdir("./trajectories/")
data = [ readdlm(file_names[i]) for i = 1 : length(file_names) ]
Next to the trajectories, we also need an array of blurring coefficients that should be equal in length to data
. If the illumination profile of the shutter is uniform, we can simply use
B = [1/6 for m = 1 : length(data)]
If the code is used to analyze simulation data with perfect time resolution, the blurring coefficients should be set to zero.
Assuming that the data are homogeneous, we can estimate the parameters a2
and σ2
using the MLE_estimator
function:
using DiffusionMLE
[a2, σ2] = MLE_estimator(B,data)
MLE_estimator
has an optional argument, interval
, which is set to [0.0,10000.0]
by default, but can be varied in the (unlikely) event that the optimizer fails to converge. Specifying the interval
can also speed up the evaluation, as seen by comparing the output of the following two lines:
@time MLE_estimator(B,data)
@time MLE_estimator(B,data,[0.0,10.0])
The parameters a2
and σ2
both have dimension length squared, so if the trajectory coordinates are given in nanometers then said parameters have dimension nanometer squared. The associated diffusion coefficient is, irrespective of the dimension d, given by
D = 0.5*σ2/Δt
The MLE_errors
function provides an estimate of the parameter uncertainties:
[δa2, δσ2] = MLE_errors(B,data,[a2, σ2])
where the uncertainty δD
, in analogous fashion to D
, follows from δσ2
by division.
A more detailed example can be found in the examples directory.
The code relevant for the analysis of heterogeneous data exploits threading, so it is recommended to run the command export JULIA_NUM_THREADS=n
, with n
being the number of available (physical) cores, before launching Julia. This speeds up the numerics significantly.
To check the number of available cores for threading, simply run
using Base.Threads; nthreads()
This should print the number n
if the above-mentioned command was properly executed.
If the data are heterogeneous, it can be analyzed with the function global_EM_estimator
as follows:
estimates, L, T = global_EM_estimator(K=2,N_local=500,N_global=50,a2_range=[0.02,20.],σ2_range=[0.02,20.],B,data)
Here, we consider K=2
subpopulations, and initiate the parameter search N_global=50
times with randomly chosen parameters from a2_range
and σ2_range
. If a search does not converge within N_local=500
steps, it is broken off. The optional argument tolerance
is set to 1.0e-10
by default, and determines the convergence rate of local parameter searches. Analogous to MLE_estimator
, the function global_EM_estimator
offers the optional argument interval
that can be set manually to fix possible convergence issues.
The output includes the likelihood score L
, an estimates
array, where the i-th column contains the parameter estimates [a2, σ2, P]
for the i-th subpopulation, and the classification coefficients T
. The latter can be used to assign the trajectories to subpopulations:
B_sub, data_sub = sort_trajectories(K=2,T,B,data)
A more detailed example can be found in the examples directory.