Skip to content

Simple examples

inuritdino edited this page Jun 24, 2017 · 33 revisions

We provide some simplest examples to start with. Usual runs of BayesForest are long. So these examples show the conceptual functionality.

Jump to Example 1 or Example 2.

Notations

Throughout these pages we maintain the following nomenclature:

segment — a basic structural unit composing branches of a tree. Usually, cylindrical.

branch — a unit of a tree composed of segments.

w — topological order, starting with 0 for the trunk.

parent — a unit the current unit originates from topologically, e.g. segment parent, branch parent. Parent is closer to the origin of a tree, i.e. the first segment.

extension — a segment originating from the current segment inside a branch. Extension is a more distant unit than the current unit from the origin.

child — a segment/branch emanating from the current unit and giving the next topological order.

Bw — branch related features of the topological order w.

Sw — segment related features of the topological order w. Segments compose branches.

This terminology was proposed in Potapov et al., GigaScience.

Example 1

In this primitive example, we will load the QSM data sets B2 and S1 for Espoo maple QSM. Then, we will try to adjust/optimize parameters of the "SSM", which is a surrogate model, producing no morphological structure and modeling no real tree growth. This phony SSM just generates random data for B2 and S1.

Skip the explanation just to run the example.

Phony SSM

The phony SSM takes 6 parameters: first two determining number of samples in each data set (i.e. S1 and B2); the other four are mean and standard deviation for Gaussian noise data for two data sets.

The function that generates the phony data sets to be compared with the real Espoo tree scatters is essentially a one-liner (function file):

function Scatter = fake_ssm(x)

Scatter = {x(3) + x(4)*randn(5,x(1)), x(5) + x(6)*randn(4,x(2))};

end
Configuration file

Our example 1 configuration file is shown below:

# Test Configuration file:
#
# Target directory, where all calculation will be carried out, keep current
target_dir = ./
# Branch and Segment orders separately to extract from the structures
branch = 2
segment = 1
# The Bra and Seg related tables (file names) of the target QSM
qsm_cyl_table = EspooSegData.txt
qsm_br_table = EspooBraData.txt
# Do not merge the orders for the tables (it does not make any difference since
# we have one order for each table).
qsm_merge = 0

# The fake Gaussian based data generator ("SSM") with 6 parameters to optimize:
# mean, std and sample size for the Bra table and similarly for the Seg table.
# See the `fake_ssm` for details
ssm_fun = @(x)fake_ssm(x)

# Global parameter ranges for the optimization
#          1    2     3    4      5    6 
ga_lb =  100  100 -50.0  0.0 -100.0  0.0
ga_ub =  1000 1000 50.0 50.0  100.0 20.0 

# Starting ranges (could be omitted, then generated from the global)
ga_init_lb = 200 400 10.0 10 -40 10
ga_init_ub = 210 430 20.0 30  40 15
# Integer parameters
ga_int_con = 1,2

# GA parameters
ga_pop_size = 20
ga_gens = 10
ga_stall = 10
ga_elite = 1
# Fix the RNG seed for reproducible research
ga_rng = 23

# Function to run after each iteration (prints the progress of optimization)
ga_out_fun = @optim_best_output

# Do we use parallel?
ga_use_par = 0

# Distance parameters
dt_scale = 0
dt_stat1d = 1
dt_dirs = 1000

# Do we make tree plots? No, because it is a fake example with no tree structure
plot = 0
movie = 0

Download the configuration file input-test.txt from here.

Run

To run BayesForest with this simplest example one needs only to type at the Matlab prompt:

BayesForest('input-test.txt')

This results in a figure showing the progress of optimization:

The top panel shows the distance value over the generations; the bottom panel shows so called 'Average distance' between individuals of the genetic algorithm, GA (it is advised to keep it around 1.0, see GA documentation). Additionally, one can pause or stop simulation using the corresponding buttons of the window.

At the end of the optimization a figure pops up showing some 2D projections of the data sets:

At the Matlab prompt the results get printed:

Optimization terminated: maximum number of generations exceeded.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% The problem type was: integerconstraints
% The best point found: 219
% The best point found: 586
% The best point found: 26.6121
% The best point found: 35.3127
% The best point found: 1.82749
% The best point found: 13.5156
% The fitness at the best point: 0.235564
% I have stopped because: 0 =>
% Optimization terminated: maximum number of generations exceeded.
% The number of generations was : 10
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The output directory:
<path-to-your-local-results-directory>

Finally, BayesForest creates a folder with the results, which is named according to the format day.month.year_hour.minute and which contains two figures above, original configuration file, and some other technical files that can be used for further processing of the results.

Summary of Example 1

Make sure in the current directory you have:

Run:

BayesForest('input-test.txt')

"Clones" generation

Note that the configuration file had an option caled ga_rng that was set to 23. This is done for reproducibility of the optimization and need not be fixed ultimately.

Importantly, for consistency in optimization process the random number generator seed value must be fixed for SSM when used in the optimization process. The optimization algorithm cannot optimize a function which is truly stochastic as it will get different/nonconsistent results for the same parameter set. To avoid this problem one must devise an SSM function with possibility to fix the randomness of its output.

For example, in our fake_ssm case we have set the random seed as:

rng(543);

in the body of the function.

The SSM function should have a possibility to release the fixed seed and to generate outputs for different (random) seeds. Random/shuffled random generator seeds could be used. This ascertains that SSM gets different outcomes for different seeds. These outcomes we call clonal...

Example 2

(Skip to How to run Example 2)

In this example, we will load the data sets Ud of the Espoo maple QSM. The SSM function will load the data set (only B2 and S1) and add noise to them. The noise parameters (4 in total) to be optimized are: two normalized factors for the mean displacements from the original data and two standard deviations (noise amplitudes).

Note that we take our QSM data as a basis for the SSM simulation. So the parameters should have a general tendency towards zero (if they are zero there is a perfect parametric match with accuracy of stochastic discrepancies). On the other hand, the normalized mean displacement parameters are fixed for all features within a data set. This creates a tension towards the opposite end, because not all the features are distributed equally (they are not normalized). Thus, the genetic algorithm does not reach zero. Since barely decreasing towards zero does not decrease the distance and the steps towards zero are not considered as very important improvements.

Surely the best parameter values are around zero:

% The best point found: -0.0465508
% The best point found: 0.0261472
% The best point found: -0.0419142
% The best point found: 0.965958
% The fitness at the best point: 0.0274339

Also the scatters look much closer to the data distributions than we saw in the Example 1:

Finally, note the population diversity of this optimization run at the bottom panel of the plot below:

The Average distance is about 1.0, which is a good amount of diversity for the genetic algorithm (GA). See the discussion on population diversity at the Matlab GA website.

Run Example 2

Download into a folder:

Run at the Matlab prompt:

BayesForest('input-test2.txt')