diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index ca7cddf..fef3eb7 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-23T12:09:26","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.5","generation_timestamp":"2024-09-23T13:22:06","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/index.html b/dev/index.html index ac17b4a..6fa0ad1 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,12 +1,12 @@ -ISOKANN.jl · ISOKANN

ISOKANN.jl

Documentation for ISOKANN.jl

Main entry points

ISOKANN.SimulationDataType
struct SimulationData{S,D,C,F}

A struct combining a simulation with the simulated coordinates and corresponding ISOKANN trainingsdata

Fields

  • sim::S: The simulation object.
  • data::D: The ISOKANN trainings data.
  • coords::C: The orginal coordinates of the simulations.
  • featurizer::F: A function mapping coordinates to ISOKANN features.
source
ISOKANN.OpenMM.OpenMMSimulationType
OpenMMSimulation(; pdb, steps, ...)
-OpenMMSimulation(; py, steps)

Constructs an OpenMM simulation object. Either use OpenMMSimulation(;py, steps) where pyis the location of a .py python script creating a OpenMM simulation object or supply a .pdb file viapdb` and the following parameters (see also defaultsystem):

Arguments

  • pdb::String: Path to the PDB file.
  • ligand::String: Path to ligand file.
  • forcefields::Vector{String}: List of force field XML files.
  • temp::Float64: Temperature in Kelvin.
  • friction::Float64: Friction coefficient in 1/picosecond.
  • step::Float64: Integration step size in picoseconds.
  • steps::Int: Number of simulation steps.
  • features: Which features to use for learning the chi function. - A vector of Int denotes the indices of all atoms to compute the pairwise distances from. - A vector of CartesianIndex{2} computes the specific distances between the atom pairs. - A number denotes the radius below which all pairs of atoms will be used (computed only on the starting configuration) - If nothing all pairwise distances are used.
  • minimize::Bool: Whether to perform energy minimization on first state.
  • nthreads: The number of threads to use for parallelization of multiple simulations.
  • mmthreads: The number of threads to use for each OpenMM simulation. Set to "gpu" to use the GPU platform.

Returns

  • OpenMMSimulation: An OpenMMSimulation object.
source
ISOKANN.propagateFunction
propagate(s::OpenMMSimulation, x0::AbstractMatrix{T}, ny; nthreads=Threads.nthreads(), mmthreads=1) where {T}

Propagates ny replicas of the OpenMMSimulation s from the inintial states x0.

Arguments

  • s: An instance of the OpenMMSimulation type.
  • x0: Matrix containing the initial states as columns
  • ny: The number of replicas to create.

Optional Arguments

  • nthreads: The number of threads to use for parallelization of multiple simulations.
  • mmthreads: The number of threads to use for each OpenMM simulation. Set to "gpu" to use the GPU platform.

Note: For CPU we observed better performance with nthreads = num cpus, mmthreads = 1 then the other way around. With GPU nthreads > 1 should be supported, but on our machine lead to slower performance then nthreads=1.

source
ISOKANN.IsoType
Iso(data; opt=NesterovRegularized(), model=defaultmodel(data), gpu=false, kwargs...)
source
Iso(sim::IsoSimulation; nx=100, nk=10, nd=1, kwargs...)

Convenience constructor which generates the SimulationData from the simulation sim and constructs the Iso object. See also Iso(data; kwargs...)

Arguments

  • sim::IsoSimulation: The IsoSimulation object.
  • nx::Int: The number of starting points.
  • nk::Int: The number of koopman samples.
  • nout::Int: Dimension of the χ function.
source
ISOKANN.run!Function
run!(iso::Iso, n=1, epochs=1)

Run the training process for the Iso model.

Arguments

  • iso::Iso: The Iso model to train.
  • n::Int: The number of (outer) Koopman iterations.
  • epochs::Int: The number of (inner) epochs to train the model for each Koopman evaluation.
source

Public API

ISOKANN.IsoMethod
Iso(data; opt=NesterovRegularized(), model=defaultmodel(data), gpu=false, kwargs...)
source
ISOKANN.IsoMethod
Iso(sim::IsoSimulation; nx=100, nk=10, nd=1, kwargs...)

Convenience constructor which generates the SimulationData from the simulation sim and constructs the Iso object. See also Iso(data; kwargs...)

Arguments

  • sim::IsoSimulation: The IsoSimulation object.
  • nx::Int: The number of starting points.
  • nk::Int: The number of koopman samples.
  • nout::Int: Dimension of the χ function.
source
ISOKANN.SimulationDataMethod
SimulationData(sim::IsoSimulation, nx::Int, nk::Int; ...)
+ISOKANN.jl · ISOKANN

ISOKANN.jl

Documentation for ISOKANN.jl

Main entry points

ISOKANN.SimulationDataType
struct SimulationData{S,D,C,F}

A struct combining a simulation with the simulated coordinates and corresponding ISOKANN trainingsdata

Fields

  • sim::S: The simulation object.
  • data::D: The ISOKANN trainings data.
  • coords::C: The orginal coordinates of the simulations.
  • featurizer::F: A function mapping coordinates to ISOKANN features.
source
ISOKANN.OpenMM.OpenMMSimulationType
OpenMMSimulation(; pdb, steps, ...)
+OpenMMSimulation(; py, steps)

Constructs an OpenMM simulation object. Either use OpenMMSimulation(;py, steps) where pyis the location of a .py python script creating a OpenMM simulation object or supply a .pdb file viapdb` and the following parameters (see also defaultsystem):

Arguments

  • pdb::String: Path to the PDB file.
  • ligand::String: Path to ligand file.
  • forcefields::Vector{String}: List of force field XML files.
  • temp::Float64: Temperature in Kelvin.
  • friction::Float64: Friction coefficient in 1/picosecond.
  • step::Float64: Integration step size in picoseconds.
  • steps::Int: Number of simulation steps.
  • features: Which features to use for learning the chi function. - A vector of Int denotes the indices of all atoms to compute the pairwise distances from. - A vector of CartesianIndex{2} computes the specific distances between the atom pairs. - A number denotes the radius below which all pairs of atoms will be used (computed only on the starting configuration) - If nothing all pairwise distances are used.
  • minimize::Bool: Whether to perform energy minimization on first state.
  • nthreads: The number of threads to use for parallelization of multiple simulations.
  • mmthreads: The number of threads to use for each OpenMM simulation. Set to "gpu" to use the GPU platform.

Returns

  • OpenMMSimulation: An OpenMMSimulation object.
source
ISOKANN.propagateFunction
propagate(s::OpenMMSimulation, x0::AbstractMatrix{T}, ny; nthreads=Threads.nthreads(), mmthreads=1) where {T}

Propagates ny replicas of the OpenMMSimulation s from the inintial states x0.

Arguments

  • s: An instance of the OpenMMSimulation type.
  • x0: Matrix containing the initial states as columns
  • ny: The number of replicas to create.

Optional Arguments

  • nthreads: The number of threads to use for parallelization of multiple simulations.
  • mmthreads: The number of threads to use for each OpenMM simulation. Set to "gpu" to use the GPU platform.

Note: For CPU we observed better performance with nthreads = num cpus, mmthreads = 1 then the other way around. With GPU nthreads > 1 should be supported, but on our machine lead to slower performance then nthreads=1.

source
ISOKANN.IsoType
Iso(data; opt=NesterovRegularized(), model=defaultmodel(data), gpu=false, kwargs...)
source
Iso(sim::IsoSimulation; nx=100, nk=10, nd=1, kwargs...)

Convenience constructor which generates the SimulationData from the simulation sim and constructs the Iso object. See also Iso(data; kwargs...)

Arguments

  • sim::IsoSimulation: The IsoSimulation object.
  • nx::Int: The number of starting points.
  • nk::Int: The number of koopman samples.
  • nout::Int: Dimension of the χ function.
source
ISOKANN.run!Function
run!(iso::Iso, n=1, epochs=1)

Run the training process for the Iso model.

Arguments

  • iso::Iso: The Iso model to train.
  • n::Int: The number of (outer) Koopman iterations.
  • epochs::Int: The number of (inner) epochs to train the model for each Koopman evaluation.
source

Public API

ISOKANN.IsoMethod
Iso(data; opt=NesterovRegularized(), model=defaultmodel(data), gpu=false, kwargs...)
source
ISOKANN.IsoMethod
Iso(sim::IsoSimulation; nx=100, nk=10, nd=1, kwargs...)

Convenience constructor which generates the SimulationData from the simulation sim and constructs the Iso object. See also Iso(data; kwargs...)

Arguments

  • sim::IsoSimulation: The IsoSimulation object.
  • nx::Int: The number of starting points.
  • nk::Int: The number of koopman samples.
  • nout::Int: Dimension of the χ function.
source
ISOKANN.SimulationDataMethod
SimulationData(sim::IsoSimulation, nx::Int, nk::Int; ...)
 SimulationData(sim::IsoSimulation, xs::AbstractMatrix, nk::Int; ...)
-SimulationData(sim::IsoSimulation, (xs,ys); ...)

Generates SimulationData from a simulation with either

  • nx initial points and nk Koopman samples
  • xs as initial points and nk Koopman sample
  • xs as inintial points and ys as Koopman samples
source
ISOKANN.AdamRegularizedFunction

Adam with L2 regularization. Note that this is different from AdamW (Adam+WeightDecay) (c.f. Decay vs L2 Reg.)

source
ISOKANN.data_from_trajectoryMethod
data_from_trajectory(xs::AbstractMatrix; reverse=false) :: DataTuple

Generate the lag-1 data from the trajectory xs. If reverse is true, also take the time-reversed lag-1 data.

source
ISOKANN.laggedtrajectoryMethod
laggedtrajectory(data::SimulationData, n) = laggedtrajectory(data.sim, n, x0=data.coords[1][:, end])

Simulate a trajectory comprising of n simulations from the last point in data

source
ISOKANN.load_trajectoryMethod
load_trajectory(filename; top=nothing, kwargs...)

wrapper around Python's mdtraj.load(). Returns a (3 * natom, nframes) shaped array.

source
ISOKANN.localpdistindsMethod
localpdistinds(coords::AbstractMatrix, radius)

Given coords of shape ( 3n x frames ) return the pairs of indices whose minimal distance along all frames is at least once lower then radius

source
ISOKANN.mergedataMethod
mergedata(d1::SimulationData, d2::SimulationData)

Merge the data and features of d1 and d2, keeping the simulation and features of d1. Note that there is no check if simulation features agree.

source
ISOKANN.pairnetMethod

Fully connected neural network with layers layers from n to nout dimensions. features allows to pass a featurizer as preprocessor, activation determines the activation function for each but the last layer lastactivation can be used to modify the last layers activation function

source
ISOKANN.pdistsMethod
pdists(coords::AbstractArray, inds::Vector{<:Tuple})

Compute the pairwise distances between the particles specified by the tuples inds over all frames in traj. Assumes a column contains all 3n coordinates.

source
ISOKANN.reactionpath_minimumFunction
reactionpath_minimum(iso::Iso, x0; steps=100)

Compute the reaction path by integrating ∇χ with orthogonal energy minimization.

Arguments

  • iso::Iso: The isomer for which the reaction path minimum is to be computed.
  • x0: The starting point for the reaction path computation.
  • steps=100: The number of steps to take along the reaction path.
source
ISOKANN.reactionpath_odeMethod
reactionpath_ode(iso, x0; steps=101, extrapolate=0, orth=0.01, solver=OrdinaryDiffEq.Tsit5(), dt=1e-3, kwargs...)

Compute the reaction path by integrating ∇χ as well as orth * F orthogonal to ∇χ where F is the original force field.

Arguments

  • iso::Iso: The isomer for which the reaction path minimum is to be computed.
  • x0: The starting point for the reaction path computation.
  • steps=100: The number of steps to take along the reaction path.
  • minimize=false: Whether to minimize the orthogonal to ∇χ before integration.
  • extrapolate=0: How fast to extrapolate beyond χ 0 and 1.
  • orth=0.01: The weight of the orthogonal force field.
  • solver=OrdinaryDiffEq.Tsit5(): The solver to use for the ODE integration.
  • dt=1e-3: The initial time step for the ODE integration.
source
ISOKANN.reactive_pathMethod

reactive_path(xi::AbstractVector, coords::AbstractMatrix; sigma, maxjump=1, method=QuantilePath(0.05), normalize=false, sortincreasing=true)

Find the maximum likelihood path (under the model of brownion motion with noise sigma) through coords with times xi. Supports either CPU or GPU arrays.

Arguments

  • coords: (ndim x npoints) matrix of coordinates.
  • xi: time coordinate of the npoints points
  • sigma: spatial noise strength of the model.
  • maxjump: upper bound to the jump in time xi along the path.
  • method: either FromToPath, QuantilePath, FullPath or MaxPath, specifying the end points of the path
  • normalize: whether to normalize all coords first
  • sortincreasing: return the path from lower to higher xi values
source
ISOKANN.run!Function
run!(iso::Iso, n=1, epochs=1)

Run the training process for the Iso model.

Arguments

  • iso::Iso: The Iso model to train.
  • n::Int: The number of (outer) Koopman iterations.
  • epochs::Int: The number of (inner) epochs to train the model for each Koopman evaluation.
source
ISOKANN.runadaptive!Method
runadaptive!(iso; generations=1, nx=10, iter=100, cutoff=Inf)

Train iso with adaptive sampling. Sample nx new data points followed by iter isokann iterations and repeat this generations times. cutoff specifies the maximal data size, after which new data overwrites the oldest data.

source
ISOKANN.save_reactive_pathFunction
save_reactive_path(iso::Iso,
+SimulationData(sim::IsoSimulation, (xs,ys); ...)

Generates SimulationData from a simulation with either

  • nx initial points and nk Koopman samples
  • xs as initial points and nk Koopman sample
  • xs as inintial points and ys as Koopman samples
source
ISOKANN.AdamRegularizedFunction

Adam with L2 regularization. Note that this is different from AdamW (Adam+WeightDecay) (c.f. Decay vs L2 Reg.)

source
ISOKANN.data_from_trajectoryMethod
data_from_trajectory(xs::AbstractMatrix; reverse=false) :: DataTuple

Generate the lag-1 data from the trajectory xs. If reverse is true, also take the time-reversed lag-1 data.

source
ISOKANN.laggedtrajectoryMethod
laggedtrajectory(data::SimulationData, n) = laggedtrajectory(data.sim, n, x0=data.coords[1][:, end])

Simulate a trajectory comprising of n simulations from the last point in data

source
ISOKANN.load_trajectoryMethod
load_trajectory(filename; top=nothing, kwargs...)

wrapper around Python's mdtraj.load(). Returns a (3 * natom, nframes) shaped array.

source
ISOKANN.localpdistindsMethod
localpdistinds(coords::AbstractMatrix, radius)

Given coords of shape ( 3n x frames ) return the pairs of indices whose minimal distance along all frames is at least once lower then radius

source
ISOKANN.mergedataMethod
mergedata(d1::SimulationData, d2::SimulationData)

Merge the data and features of d1 and d2, keeping the simulation and features of d1. Note that there is no check if simulation features agree.

source
ISOKANN.pairnetMethod

Fully connected neural network with layers layers from n to nout dimensions. features allows to pass a featurizer as preprocessor, activation determines the activation function for each but the last layer lastactivation can be used to modify the last layers activation function

source
ISOKANN.pdistsMethod
pdists(coords::AbstractArray, inds::Vector{<:Tuple})

Compute the pairwise distances between the particles specified by the tuples inds over all frames in traj. Assumes a column contains all 3n coordinates.

source
ISOKANN.reactionpath_minimumFunction
reactionpath_minimum(iso::Iso, x0; steps=100)

Compute the reaction path by integrating ∇χ with orthogonal energy minimization.

Arguments

  • iso::Iso: The isomer for which the reaction path minimum is to be computed.
  • x0: The starting point for the reaction path computation.
  • steps=100: The number of steps to take along the reaction path.
source
ISOKANN.reactionpath_odeMethod
reactionpath_ode(iso, x0; steps=101, extrapolate=0, orth=0.01, solver=OrdinaryDiffEq.Tsit5(), dt=1e-3, kwargs...)

Compute the reaction path by integrating ∇χ as well as orth * F orthogonal to ∇χ where F is the original force field.

Arguments

  • iso::Iso: The isomer for which the reaction path minimum is to be computed.
  • x0: The starting point for the reaction path computation.
  • steps=100: The number of steps to take along the reaction path.
  • minimize=false: Whether to minimize the orthogonal to ∇χ before integration.
  • extrapolate=0: How fast to extrapolate beyond χ 0 and 1.
  • orth=0.01: The weight of the orthogonal force field.
  • solver=OrdinaryDiffEq.Tsit5(): The solver to use for the ODE integration.
  • dt=1e-3: The initial time step for the ODE integration.
source
ISOKANN.reactive_pathMethod

reactive_path(xi::AbstractVector, coords::AbstractMatrix; sigma, maxjump=1, method=QuantilePath(0.05), normalize=false, sortincreasing=true)

Find the maximum likelihood path (under the model of brownion motion with noise sigma) through coords with times xi. Supports either CPU or GPU arrays.

Arguments

  • coords: (ndim x npoints) matrix of coordinates.
  • xi: time coordinate of the npoints points
  • sigma: spatial noise strength of the model.
  • maxjump: upper bound to the jump in time xi along the path.
  • method: either FromToPath, QuantilePath, FullPath or MaxPath, specifying the end points of the path
  • normalize: whether to normalize all coords first
  • sortincreasing: return the path from lower to higher xi values
source
ISOKANN.run!Function
run!(iso::Iso, n=1, epochs=1)

Run the training process for the Iso model.

Arguments

  • iso::Iso: The Iso model to train.
  • n::Int: The number of (outer) Koopman iterations.
  • epochs::Int: The number of (inner) epochs to train the model for each Koopman evaluation.
source
ISOKANN.runadaptive!Method
runadaptive!(iso; generations=1, nx=10, iter=100, cutoff=Inf)

Train iso with adaptive sampling. Sample nx new data points followed by iter isokann iterations and repeat this generations times. cutoff specifies the maximal data size, after which new data overwrites the oldest data.

source
ISOKANN.save_reactive_pathFunction
save_reactive_path(iso::Iso,
     coords::AbstractMatrix=getcoords(iso.data) |> cpu;
     sigma=1,
     maxjump=1,
     out="out/reactive_path.pdb",
     source=pdbfile(iso.data),
-    kwargs...)

Extract and save the reactive path of a given iso.

Computes the maximum likelihood path with parameter sigma along the given data points, aligns it and saves it to the out path.

See also reactive_path.

Arguments

  • iso::Iso: The Iso for which the reactive path is computed.
  • out="out/reactive_path.pdb": The output file path for saving the reactive path.
  • source: The source .pdb file providing the topology
  • kwargs...: additional parameters passed to reactive_path.

Returns

  • ids: The IDs of the reactive path.
source
ISOKANN.save_trajectoryMethod
save_trajectory(filename, coords::AbstractMatrix; top::String)

save the trajectory given in coords to filename with the topology provided by the file top using mdtraj.

source
ISOKANN.trajectorydata_burstsMethod
trajectorydata_bursts(sim::IsoSimulation, steps, nk; kwargs...)

Simulate a single long trajectory of steps times the lagtime and start nk burst trajectories at each step for the Koopman samples.

x0–-x––x–- / | / | y y y y

source
ISOKANN.trajectorydata_linearMethod
trajectorydata_linear(sim::IsoSimulation, steps; reverse=false, kwargs...)

Simulate a single long trajectory of steps times the lagtime and use this "chain" to generate the corresponding ISOKANN data. If reverse is true, also add the time-reversed transitions

x (<)–> x (<)–> x

source

Internal API

ISOKANN.DataTupleType

DataTuple = Tuple{Matrix{T},Array{T,3}} where {T<:Number}

We represent data as a tuple of xs and ys.

xs is a matrix of size (d, n) where d is the dimension of the system and n the number of samples. ys is a tensor of size (d, k, n) where k is the number of koopman samples.

source
ISOKANN.IsoSimulationType
abstract type IsoSimulation

Abstract type representing an IsoSimulation. Should implement the methods getcoords, propagate, dim

source
ISOKANN.Stabilize2Type

TransformStabilize(transform, last=nothing)

Wraps another transform and permutes its target to match the previous target

Currently we also have the stablilization (wrt to the model though) inside each Transform. TODO: Decide which to keep

source
ISOKANN.TransformISAType

TransformISA(permute)

Compute the target via the inner simplex algorithm (without feasiblization routine). permute specifies whether to apply the stabilizing permutation

source
ISOKANN.TransformPseudoInvType
TransformPseudoInv(normalize, direct, eigenvecs, permute)

Compute the target by approximately inverting the action of K with the Moore-Penrose pseudoinverse.

If direct==true solve chi * pinv(K(chi)), otherwise inv(K(chi) * pinv(chi))). eigenvecs specifies whether to use the eigenvectors of the schur matrix. normalize specifies whether to renormalize the resulting target vectors. permute specifies whether to permute the target for stability.

source
ISOKANN.adddataMethod
adddata(data::D, model, sim, ny, lastn=1_000_000)::D

Generate new data for ISOKANN by adaptive subsampling using the chi-stratified/-uniform method.

  1. Adaptively subsample ny points from data uniformly along their model values.
  2. propagate according to the simulation model.
  3. return the newly obtained data concatenated to the input data

The subsamples are taken only from the lastn last datapoints in data.

Examples

julia> (xs, ys) = adddata((xs,ys), chi, mollysim)
source
ISOKANN.adddataMethod
adddata(d::SimulationData, model, n)

χ-stratified subsampling. Select n samples amongst the provided ys/koopman points of d such that their χ-value according to model is approximately uniformly distributed and propagate them. Returns a new SimulationData which has the new data appended.

source
ISOKANN.addextrapolates!Method
addextrapolates!(iso, n, stepsize=0.01, steps=10)

Sample new data starting points obtained by extrapolating the chi function beyond the current extrema and attach it to the iso objects data.

Samples n points at the lower and upper end each, resulting in 2n new points. stepis the magnitude of chi-value-change per step andsteps`` is the number of steps to take. E.g. 10 steps of stepsize 0.01 result in a change in chi of about 0.1.

The obtained data is filtered such that unstable simulations should be removed, which may result in less then 2n points being added.

source
ISOKANN.bootstrapMethod
bootstrap(sim, nx, ny) :: DataTuple

compute initial data by propagating the molecules initial state to obtain the xs and propagating them further for the ys

source
ISOKANN.energyminimization_chilevelMethod
energyminimization_chilevel(iso, x0; f_tol=1e-3, alphaguess=1e-5, iterations=20, show_trace=false, skipwater=false, algorithm=Optim.GradientDescent, xtol=nothing)

Local energy minimization on the current levelset of the chi function

source
ISOKANN.exportdataFunction
exportdata(data::AbstractArray, model, sys, path="out/data.pdb")

Export data to a PDB file.

This function takes an AbstractArray data, sorts it according to the model evaluation, removes duplicates, transforms it to standard form and saves it as a PDB file to path.

source
ISOKANN.extrapolateFunction
extrapolate(iso, n, stepsize=0.1, steps=1, minimize=true)

Take the n most extreme points of the chi-function of the iso object and extrapolate them by stepsize for steps steps beyond their extrema, resulting in 2n new points. If minimize is true, the new points are energy minimized.

source
ISOKANN.fixpermMethod
fixperm(new, old)

Permutes the rows of new such as to minimize L1 distance to old.

Arguments

  • new: The data to match to the reference data.
  • old: The reference data.
source
ISOKANN.flatpairdistsFunction
flatpairdists(x)

Assumes each col of x to be a flattened representation of multiple 3d coords. Returns the flattened pairwise distances as columns.

source
ISOKANN.growmodelMethod

Given a model and return a copy with its last layer replaced with given output dimension n

source
ISOKANN.loadMethod
load(path::String, iso::Iso)

Load the Iso object from a JLD2 file Note that it will be loaded to the CPU, even if it was saved on the GPU. An OpenMMSimulation will be reconstructed anew from the saved pdb file.

source
ISOKANN.pairdistfeaturesMethod
pairdistfeatures(inds::AbstractVector)

Returns a featurizer function which computes the pairwise distances between the particles specified by inds

source
ISOKANN.pickclosest_sortMethod

pickclosest(haystack, needles)

Return the indices into haystack which lie closest to needles without duplicates by removing haystack candidates after a match. Note that this is not invariant under pertubations of needles

scales with n log(n) m where n=length(haystack), m=length(needles)

source
ISOKANN.plotatoms!Function

scatter plot of all first "O" atoms of the starting points xs as well as the "O" atoms from the koopman samples to the first point from ys

source
ISOKANN.reactionforceFunction
reactionforce(iso, sim, x, direction, orth=1)

Compute the vector f with colinear component to dchi/dx such that dchi/dx * f = 1 and orth*forcefield in the orthogonal space

source
ISOKANN.resample_kdeMethod
resample_kde(xs, ys, n; kwargs...)

Return n indices of ys such that the corresponding points "fill the gaps" in the KDE of xs. For possible kwargs see kde_needles.

source
ISOKANN.saveMethod
save(path::String, iso::Iso)

Save the complete Iso object to a JLD2 file

source
ISOKANN.savecoordsFunction
savecoords(path::String, iso::Iso, inds=1:numobs(iso.data))

Save the coordinates of the specified observation indices from the data of of iso to the file path.

savecoords(path::String, iso::Iso, coords::AbstractArray)

Save the coordinates of the specified matrix of coordinates to a file, using the molecule in iso as a template.

source
ISOKANN.saveextremaMethod
saveextrema(path::String, iso::Iso)

Save the two extermal configurations (metastabilities) to the file path.

source
ISOKANN.simulationtimeMethod
simulationtime(iso::Iso)

print and return the total simulation time contained in the data of iso in nanoseconds.

source
ISOKANN.sqpairdistMethod
sqpairdist(x::AbstractArray)

Compute the squared pairwise distances between the columns of x. If x has 3 dimensions, the computation is batched along the 3rd dimension.

source
ISOKANN.subsampleMethod
subsample(model, data::Array, n) :: Matrix
-subsample(model, data::Tuple, n) :: Tuple

Subsample n points of data uniformly in model. If model returns multiple values per sample, subsample along each dimension.

source
ISOKANN.subsample_indsMethod
subsample_inds(model, xs, n) :: Vector{Int}

Returns n indices of xs such that model(xs[inds]) is approximately uniformly distributed.

source
ISOKANN.subsample_uniformgridMethod

subsbample_uniformgrid(ys, n) -> is

given a list of values ys, return nindicesissuch thatys[is]` are approximately uniform by picking the closest points to a randomly perturbed grid in [0,1].

source
ISOKANN.trajectoryMethod

trajectory(l::AbstractLangevin; T=lagtime(l), x0=randx0(l), save_start=false, saveat=lagtime(l), dt=dt(l)) generate a trajectory of length T, starting at x0 with stepsize dt, saving the output every saveat time.

source
ISOKANN.writechemfileMethod
writechemfile(filename, data::Array{<:Any,2}; source)

Save the coordinates in data to filename with source as template using the Chemfiles library

source
ISOKANN.OpenMM.integrate_langevinFunction
integrate_langevin(sim::OpenMMSimulation, x0=getcoords(sim); steps=steps(sim), F_ext::Union{Function,Nothing}=nothing, saveevery::Union{Int, nothing}=nothing)

Integrate the Langevin equations with a Euler-Maruyama scheme, allowing for external forces.

  • Fext: An additional force perturbation. It is expected to have the form Fext(F, x) and mutating the provided force F.
  • saveevery: If nothing, returns just the last point, otherwise returns an array saving every saveevery frame.
source
ISOKANN.propagateMethod
propagate(s::OpenMMSimulation, x0::AbstractMatrix{T}, ny; nthreads=Threads.nthreads(), mmthreads=1) where {T}

Propagates ny replicas of the OpenMMSimulation s from the inintial states x0.

Arguments

  • s: An instance of the OpenMMSimulation type.
  • x0: Matrix containing the initial states as columns
  • ny: The number of replicas to create.

Optional Arguments

  • nthreads: The number of threads to use for parallelization of multiple simulations.
  • mmthreads: The number of threads to use for each OpenMM simulation. Set to "gpu" to use the GPU platform.

Note: For CPU we observed better performance with nthreads = num cpus, mmthreads = 1 then the other way around. With GPU nthreads > 1 should be supported, but on our machine lead to slower performance then nthreads=1.

source
ISOKANN.savecoordsMethod
savecoords(path, sim::OpenMMSimulation, coords::AbstractArray{T})

Save the given coordinates in a .pdb file using OpenMM

source
+ kwargs...)

Extract and save the reactive path of a given iso.

Computes the maximum likelihood path with parameter sigma along the given data points, aligns it and saves it to the out path.

See also reactive_path.

Arguments

  • iso::Iso: The Iso for which the reactive path is computed.
  • out="out/reactive_path.pdb": The output file path for saving the reactive path.
  • source: The source .pdb file providing the topology
  • kwargs...: additional parameters passed to reactive_path.

Returns

  • ids: The IDs of the reactive path.
source
ISOKANN.save_trajectoryMethod
save_trajectory(filename, coords::AbstractMatrix; top::String)

save the trajectory given in coords to filename with the topology provided by the file top using mdtraj.

source
ISOKANN.trajectorydata_burstsMethod
trajectorydata_bursts(sim::IsoSimulation, steps, nk; kwargs...)

Simulate a single long trajectory of steps times the lagtime and start nk burst trajectories at each step for the Koopman samples.

x0–-x––x–- / | / | y y y y

source
ISOKANN.trajectorydata_linearMethod
trajectorydata_linear(sim::IsoSimulation, steps; reverse=false, kwargs...)

Simulate a single long trajectory of steps times the lagtime and use this "chain" to generate the corresponding ISOKANN data. If reverse is true, also add the time-reversed transitions

x (<)–> x (<)–> x

source

Internal API

ISOKANN.DataTupleType

DataTuple = Tuple{Matrix{T},Array{T,3}} where {T<:Number}

We represent data as a tuple of xs and ys.

xs is a matrix of size (d, n) where d is the dimension of the system and n the number of samples. ys is a tensor of size (d, k, n) where k is the number of koopman samples.

source
ISOKANN.IsoSimulationType
abstract type IsoSimulation

Abstract type representing an IsoSimulation. Should implement the methods getcoords, propagate, dim

source
ISOKANN.Stabilize2Type

TransformStabilize(transform, last=nothing)

Wraps another transform and permutes its target to match the previous target

Currently we also have the stablilization (wrt to the model though) inside each Transform. TODO: Decide which to keep

source
ISOKANN.TransformISAType

TransformISA(permute)

Compute the target via the inner simplex algorithm (without feasiblization routine). permute specifies whether to apply the stabilizing permutation

source
ISOKANN.TransformPseudoInvType
TransformPseudoInv(normalize, direct, eigenvecs, permute)

Compute the target by approximately inverting the action of K with the Moore-Penrose pseudoinverse.

If direct==true solve chi * pinv(K(chi)), otherwise inv(K(chi) * pinv(chi))). eigenvecs specifies whether to use the eigenvectors of the schur matrix. normalize specifies whether to renormalize the resulting target vectors. permute specifies whether to permute the target for stability.

source
ISOKANN.adddataMethod
adddata(data::D, model, sim, ny, lastn=1_000_000)::D

Generate new data for ISOKANN by adaptive subsampling using the chi-stratified/-uniform method.

  1. Adaptively subsample ny points from data uniformly along their model values.
  2. propagate according to the simulation model.
  3. return the newly obtained data concatenated to the input data

The subsamples are taken only from the lastn last datapoints in data.

Examples

julia> (xs, ys) = adddata((xs,ys), chi, mollysim)
source
ISOKANN.adddataMethod
adddata(d::SimulationData, model, n)

χ-stratified subsampling. Select n samples amongst the provided ys/koopman points of d such that their χ-value according to model is approximately uniformly distributed and propagate them. Returns a new SimulationData which has the new data appended.

source
ISOKANN.addextrapolates!Method
addextrapolates!(iso, n, stepsize=0.01, steps=10)

Sample new data starting points obtained by extrapolating the chi function beyond the current extrema and attach it to the iso objects data.

Samples n points at the lower and upper end each, resulting in 2n new points. stepis the magnitude of chi-value-change per step andsteps`` is the number of steps to take. E.g. 10 steps of stepsize 0.01 result in a change in chi of about 0.1.

The obtained data is filtered such that unstable simulations should be removed, which may result in less then 2n points being added.

source
ISOKANN.bootstrapMethod
bootstrap(sim, nx, ny) :: DataTuple

compute initial data by propagating the molecules initial state to obtain the xs and propagating them further for the ys

source
ISOKANN.energyminimization_chilevelMethod
energyminimization_chilevel(iso, x0; f_tol=1e-3, alphaguess=1e-5, iterations=20, show_trace=false, skipwater=false, algorithm=Optim.GradientDescent, xtol=nothing)

Local energy minimization on the current levelset of the chi function

source
ISOKANN.exportdataFunction
exportdata(data::AbstractArray, model, sys, path="out/data.pdb")

Export data to a PDB file.

This function takes an AbstractArray data, sorts it according to the model evaluation, removes duplicates, transforms it to standard form and saves it as a PDB file to path.

source
ISOKANN.extrapolateFunction
extrapolate(iso, n, stepsize=0.1, steps=1, minimize=true)

Take the n most extreme points of the chi-function of the iso object and extrapolate them by stepsize for steps steps beyond their extrema, resulting in 2n new points. If minimize is true, the new points are energy minimized.

source
ISOKANN.fixpermMethod
fixperm(new, old)

Permutes the rows of new such as to minimize L1 distance to old.

Arguments

  • new: The data to match to the reference data.
  • old: The reference data.
source
ISOKANN.flatpairdistsFunction
flatpairdists(x)

Assumes each col of x to be a flattened representation of multiple 3d coords. Returns the flattened pairwise distances as columns.

source
ISOKANN.growmodelMethod

Given a model and return a copy with its last layer replaced with given output dimension n

source
ISOKANN.loadMethod
load(path::String, iso::Iso)

Load the Iso object from a JLD2 file Note that it will be loaded to the CPU, even if it was saved on the GPU. An OpenMMSimulation will be reconstructed anew from the saved pdb file.

source
ISOKANN.pairdistfeaturesMethod
pairdistfeatures(inds::AbstractVector)

Returns a featurizer function which computes the pairwise distances between the particles specified by inds

source
ISOKANN.pickclosest_sortMethod

pickclosest(haystack, needles)

Return the indices into haystack which lie closest to needles without duplicates by removing haystack candidates after a match. Note that this is not invariant under pertubations of needles

scales with n log(n) m where n=length(haystack), m=length(needles)

source
ISOKANN.plotatoms!Function

scatter plot of all first "O" atoms of the starting points xs as well as the "O" atoms from the koopman samples to the first point from ys

source
ISOKANN.reactionforceFunction
reactionforce(iso, sim, x, direction, orth=1)

Compute the vector f with colinear component to dchi/dx such that dchi/dx * f = 1 and orth*forcefield in the orthogonal space

source
ISOKANN.resample_kdeMethod
resample_kde(xs, ys, n; kwargs...)

Return n indices of ys such that the corresponding points "fill the gaps" in the KDE of xs. For possible kwargs see kde_needles.

source
ISOKANN.saveMethod
save(path::String, iso::Iso)

Save the complete Iso object to a JLD2 file

source
ISOKANN.savecoordsFunction
savecoords(path::String, iso::Iso, inds=1:numobs(iso.data))

Save the coordinates of the specified observation indices from the data of of iso to the file path.

savecoords(path::String, iso::Iso, coords::AbstractArray)

Save the coordinates of the specified matrix of coordinates to a file, using the molecule in iso as a template.

source
ISOKANN.saveextremaMethod
saveextrema(path::String, iso::Iso)

Save the two extermal configurations (metastabilities) to the file path.

source
ISOKANN.simulationtimeMethod
simulationtime(iso::Iso)

print and return the total simulation time contained in the data of iso in nanoseconds.

source
ISOKANN.sqpairdistMethod
sqpairdist(x::AbstractArray)

Compute the squared pairwise distances between the columns of x. If x has 3 dimensions, the computation is batched along the 3rd dimension.

source
ISOKANN.subsampleMethod
subsample(model, data::Array, n) :: Matrix
+subsample(model, data::Tuple, n) :: Tuple

Subsample n points of data uniformly in model. If model returns multiple values per sample, subsample along each dimension.

source
ISOKANN.subsample_indsMethod
subsample_inds(model, xs, n) :: Vector{Int}

Returns n indices of xs such that model(xs[inds]) is approximately uniformly distributed.

source
ISOKANN.subsample_uniformgridMethod

subsbample_uniformgrid(ys, n) -> is

given a list of values ys, return nindicesissuch thatys[is]` are approximately uniform by picking the closest points to a randomly perturbed grid in [0,1].

source
ISOKANN.trajectoryMethod

trajectory(l::AbstractLangevin; T=lagtime(l), x0=randx0(l), save_start=false, saveat=lagtime(l), dt=dt(l)) generate a trajectory of length T, starting at x0 with stepsize dt, saving the output every saveat time.

source
ISOKANN.writechemfileMethod
writechemfile(filename, data::Array{<:Any,2}; source)

Save the coordinates in data to filename with source as template using the Chemfiles library

source
ISOKANN.OpenMM.integrate_langevinFunction
integrate_langevin(sim::OpenMMSimulation, x0=getcoords(sim); steps=steps(sim), F_ext::Union{Function,Nothing}=nothing, saveevery::Union{Int, nothing}=nothing)

Integrate the Langevin equations with a Euler-Maruyama scheme, allowing for external forces.

  • Fext: An additional force perturbation. It is expected to have the form Fext(F, x) and mutating the provided force F.
  • saveevery: If nothing, returns just the last point, otherwise returns an array saving every saveevery frame.
source
ISOKANN.propagateMethod
propagate(s::OpenMMSimulation, x0::AbstractMatrix{T}, ny; nthreads=Threads.nthreads(), mmthreads=1) where {T}

Propagates ny replicas of the OpenMMSimulation s from the inintial states x0.

Arguments

  • s: An instance of the OpenMMSimulation type.
  • x0: Matrix containing the initial states as columns
  • ny: The number of replicas to create.

Optional Arguments

  • nthreads: The number of threads to use for parallelization of multiple simulations.
  • mmthreads: The number of threads to use for each OpenMM simulation. Set to "gpu" to use the GPU platform.

Note: For CPU we observed better performance with nthreads = num cpus, mmthreads = 1 then the other way around. With GPU nthreads > 1 should be supported, but on our machine lead to slower performance then nthreads=1.

source
ISOKANN.savecoordsMethod
savecoords(path, sim::OpenMMSimulation, coords::AbstractArray{T})

Save the given coordinates in a .pdb file using OpenMM

source
diff --git a/dev/installation/index.html b/dev/installation/index.html index 0c813f8..dfe1775 100644 --- a/dev/installation/index.html +++ b/dev/installation/index.html @@ -1,3 +1,3 @@ Installation · ISOKANN

Installation

Install Julia (>=v1.10) using https://github.com/JuliaLang/juliaup

curl -fsSL https://install.julialang.org | sh

After restarting your shell you should be able to start the Julia REPL via the command julia.

In the REPL you can add ISOKANN.jl to your project by entering the package mode (type ]) and typing

pkg> add ISOKANN
-pkg> test ISOKANN

Note that this can take a while on the first run as Julia downloads and precompiles all dependencies.

We plan on installing OpenMM automatically with ISOKANN. Right now, if you want to use openmm with ISOKANN you will need to make it available to PyCall.jl. This should work automatically, when Julia is using its own Conda.jl Conda environment:

  • starting julia with the environment variable shell> PYTHON="" julia
  • rebuild PyCall pkg> build PyCall.)

See also the PyCall docs.

Development

If you want to make changes to ISOKANN you should clone it into a directory

git clone git@github.com:axsk/ISOKANN.jl.git

Then start Julia in that directory, activate it with ]activate ., instantiate the dependencies with ]instantiate.

You should then be able to run the tests with ]test or start using ISOKANN.

We strongly recommend the Revise.jl package (using Revise) before ISOKANN, so that your changes will automatically load in your current session.

+pkg> test ISOKANN

Note that this can take a while on the first run as Julia downloads and precompiles all dependencies.

We plan on installing OpenMM automatically with ISOKANN. Right now, if you want to use openmm with ISOKANN you will need to make it available to PyCall.jl. This should work automatically, when Julia is using its own Conda.jl Conda environment:

See also the PyCall docs.

Development

If you want to make changes to ISOKANN you should clone it into a directory

git clone git@github.com:axsk/ISOKANN.jl.git

Then start Julia in that directory, activate it with ]activate ., instantiate the dependencies with ]instantiate.

You should then be able to run the tests with ]test or start using ISOKANN.

We strongly recommend the Revise.jl package (using Revise) before ISOKANN, so that your changes will automatically load in your current session.

diff --git a/dev/introduction/index.html b/dev/introduction/index.html index 8de6ed6..ae2ff9a 100644 --- a/dev/introduction/index.html +++ b/dev/introduction/index.html @@ -3,4 +3,4 @@ run!(iso) chis(iso)

For more advanced use, such as with the adaptive sampling algorithms we pass a SimulationData object instead of the data tuple to the Iso constructor.

The SimulationData itself is composed of a Simulation, its simulated trajectory data as well as the features fed into the neural network for training. We supply some basic simulations which can generate the data, e.g. Doublewell, MuellerBrown, Diffusion, MollySimulation and OpenMMSimulation. Of course you can write your own Simulation which in its most basic form needs to supply only the propagate method.

sim = Doublewell()
 data = isodata(sim, 100, 20)
-iso = Iso(data)

We also provide different type of wrappers to load simulations [vgv] or generate data from trajectories [IsoMu].

For an advanced example take a look at the scripts/vgvadapt.jl file.

Components

The OpenMMSimulation is a good example for an Simulation object. It parametrises a system by specifying a molecular simulation by reading the molecular structure from a .pdb file but also the system temperature, the simulation lag time and other simulation parameters.

The SimulationData in turn links such a simulation Simulation to actual simulation data which is used by ISOKANN for training. Through the specification of a featurizer the neural network does not need to digest the simulation coordinates but can use optimized features which for example guarantee invariance under rigid transformations. By default the featurizer is inhereted from the default featurizer of the Simulation. For the OpenMMSimulation we have pre-implemented pairwise distances between all atoms, locally close atoms and/or the c-Alpha atoms (c.f. the OpenMMSimulation docstring).

The Iso object then brings together the SimulationData with a neural network model and an optimizer. Its main use is together with the training routine run!() which computes the ISOKANN iteration via isotarget and updates the networks weights with train_batch. The logger field allows to ammend other operations such as the default autoplot() which displays the progress during training. The default model is the pairnet which constructs a fully connected network of a given number of layers of descreasing width and the default optimizer is Adam with weight decay.

Adaptive sampling is facilitated either by the runadaptive! method, or the individual adddata!, resample_kde!, addextrapolates! used in a custom training routine.

The learned chi values can be accessed via chis(::Iso) and the reaction rates via exit_rates(::Iso)

Contents of the source files

Core:

Simulators:

Utility:

Experimental:

+iso = Iso(data)

We also provide different type of wrappers to load simulations [vgv] or generate data from trajectories [IsoMu].

For an advanced example take a look at the scripts/vgvadapt.jl file.

Components

The OpenMMSimulation is a good example for an Simulation object. It parametrises a system by specifying a molecular simulation by reading the molecular structure from a .pdb file but also the system temperature, the simulation lag time and other simulation parameters.

The SimulationData in turn links such a simulation Simulation to actual simulation data which is used by ISOKANN for training. Through the specification of a featurizer the neural network does not need to digest the simulation coordinates but can use optimized features which for example guarantee invariance under rigid transformations. By default the featurizer is inhereted from the default featurizer of the Simulation. For the OpenMMSimulation we have pre-implemented pairwise distances between all atoms, locally close atoms and/or the c-Alpha atoms (c.f. the OpenMMSimulation docstring).

The Iso object then brings together the SimulationData with a neural network model and an optimizer. Its main use is together with the training routine run!() which computes the ISOKANN iteration via isotarget and updates the networks weights with train_batch. The logger field allows to ammend other operations such as the default autoplot() which displays the progress during training. The default model is the pairnet which constructs a fully connected network of a given number of layers of descreasing width and the default optimizer is Adam with weight decay.

Adaptive sampling is facilitated either by the runadaptive! method, or the individual adddata!, resample_kde!, addextrapolates! used in a custom training routine.

The learned chi values can be accessed via chis(::Iso) and the reaction rates via exit_rates(::Iso)

Contents of the source files

Core:

Simulators:

Utility:

Experimental:

diff --git a/dev/isomu/index.html b/dev/isomu/index.html index 23838b7..ecf3f23 100644 --- a/dev/isomu/index.html +++ b/dev/isomu/index.html @@ -21,4 +21,4 @@ gpu!(mu) # transfer model to gpu train!(mu, 10000) # 10000 iterations adjust!(mu, 1e-4, lambda=1e-3) # set learnrate to 1e-4 and decay to 1e-3 -train!(mu, 10000) # 10000 iterations +train!(mu, 10000) # 10000 iterations