Optimizers

Table of Contents

Cross Entropy
Fully Adaptive Cross Entropy (FACE)
Gradient Descent
- Available variants
Parallel Tempering
- Available cooling schedules
Simulated Annealing
- Available cooling schedules

Cross Entropy

In the pseudo code the algorithm does:

For n iterations do:

Sample individuals from distribution

evaluate individuals and get fitness

pick rho * pop_size number of elite individuals

Out of the remaining non-elite individuals, select them using a simulated-annealing style selection based on the difference between their fitness and the 1-rho quantile (gamma) fitness, and the current temperature

Fit the distribution family to the new elite individuals by minimizing cross entropy. The distribution fitting is smoothed to prevent premature convergence to local minima. A weight equal to the smoothing parameter is assigned to the previous parameters when smoothing.

Fully Adaptive Cross Entropy (FACE)

In the pseudo code the algorithm does:

For n iterations do:

Sample individuals from distribution

evaluate individuals and get fitness

check if gamma or best individuals fitness increased

if not increase population size by n_expand (if not yet max_pop_size else stop) and sample again (1) else set pop_size = min_pop_size and proceed

pick n_elite individuals with highest fitness

Out of the remaining non-elite individuals, select them using a simulated-annealing style selection based on the difference between their fitness and the 1-rho quantile (gamma) fitness, and the current temperature

Fit the distribution family to the new elite individuals by minimizing cross entropy. The distribution fitting is smoothed to prevent premature convergence to local minima. A weight equal to the smoothing parameter is assigned to the previous parameters when smoothing.

Gradient Descent

In the pseudo code the algorithm does:

For n iterations do:

Explore the fitness of individuals in the close vicinity of the current one

Calculate the gradient based on these fitnesses.

Create the new 'current individual' by taking a step in the parameters space along the direction of the largest ascent of the plane

Available variants

Classic Gradient Descent
Stochastic Gradient Descent
ADAM
RMSProp

Parallel Tempering

Parallel Tempering is a search algorithm, that uses multiple simulated annealing algorithms at the same time and has a certain chance of two annealing algorithms switching temperatures. Each of the annealing algorithms can have different cooling schedules and respective decay parameters or staring/ ending temperatures. This effectively has a similar functional effect, as a single simulated annealing with multiple coolings and reheatings, but needs fewer parameters (like when to reheat and how often). For details on simulated annealing, please read the documentation on it.

Note: For simplicity sake, not the positions, but the temperature and the schedule are swapped, which ammounts to the exact same. The temperature and the schedules are each stored in lists, which are both indexed by 'compare_indices'. If the swap criterion between two schedules are met, the respective entries for 'compare_indices' are swapped. To get the parallel runs, 'n_parallel_runs" is used - each individual is one of the parallel runs.

The algorithm does:

For n iterations and each cooling schedule do:

Take a step of size noisy step in a random direction

If it reduces the cost, keep the solution

Otherwise keep with probability exp(- (f_new - f) / T)

Swap positions between two randomly chosen schedules with probability exp(-((f_1 - f_2) * (1 / (k * T_1) - 1 / (k * T_2)))) with k being a constant

Available cooling schedules

Multiplicative Monotonic Cooling

This schedule type multiplies the starting temperature by a factor that decreases over time (number k of the performed iteration steps). It requires a decay parameter (alpha) but not an ending temperature, as the prgression of the temperature is well definded by the decay parameter only. The Multiplicative Monotonic Cooling schedules are: Exponential multiplicative cooling, Logarithmical multiplicative cooling, Linear multiplicative cooling and Quadratic multiplicative cooling. Source: Kirkpatrick, Gelatt and Vecchi (1983)

Exponential multiplicative cooling

Default cooling schedule for typical applications of simulated annealing. Each step, the temperature T_k is multiplied by the factor alpha (which has to be between 0 and 1) or in other words it is the starting temperature T_0 multiplied by the factor alpha by the power of k: T_k = T_0 * alpha^k

Logarithmical multiplicative cooling

The factor by which the temperature decreases, is indirectly proportional to the log of k. Therefore it slows down the cooling, the further progressed the schedule is. Alpha has to be largert than one. T_k = T_0 / ( 1 + alpha* log (1 + k) )

Linear multiplicative cooling

Behaves similar to Logarithmical multiplicative cooling in that the decrease gets lower over time, but not as pronounced. The decrease is indirectly proportional to alpha times k and alpha has to be larger than zero: T_k = T_0 / ( 1 + alpha*k)

Quadratic multiplicative cooling

This schedule stays at high temperatures longer, than the other schedules and has a steeper cooling later in the process. Alpha has to be larger than zero. T_k = T_0 / ( 1 + alpha*k^2)

Additive Monotonic Cooling

The differences to Multiplicative Monotonic Cooling are, that the final temperature T_n and the number of iterations n are needed also. So this cannot be used as intended, if the stop criterion is something different, than a certain number of iteration steps. A decay parameter is not needed. Each temperature is computed, by adding a term to the final temperature. The Additive Monotonic Cooling schedules are: Linear additive cooling, Quadratic additive cooling, Exponential additive cooling and Trigonometric additive cooling. Source. Additive monotonic cooling B. T. Luke (2005)

Linear additive cooling

This schedule adds a term to the final temperature, which decreases linearily with the progression of the schedule. T_k = T_n + (T_0 -T_n)*((n-k)/n)

Quadratic additive cooling

This schedule adds a term to the final temperature, which decreases q uadratically with the progression of the schedule. T_k = T_n + (T_0 -T_n)*((n-k)/n)^2

Exponential additive

Uses a complicated formula, to come up with a schedule, that has a slow start, a steep decrease in temperature in the middle and a slow decrease at the end of the process. T_k = T_n + (T_0 - T_n) * (1/(1+exp( 2*ln(T_0 - T_n)/n * (k- n/2) ) ) )

Trigonometric additive cooling

This schedule has a similar behavior as Exponential additive, but less pronounced. T_k = T_n + (T_0 - T_n)/2 * (1+cos(k*pi/n))

Simulated Annealing

In the pseudo code the algorithm does:

For n iterations do:

Take a step of size noisy step in a random direction

If it reduces the cost, keep the solution

Otherwise keep with probability exp(- (f_new - f) / T)

Available cooling schedules

Same as above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizers

Cross Entropy

Fully Adaptive Cross Entropy (FACE)

Gradient Descent

Available variants

Parallel Tempering

Available cooling schedules

Simulated Annealing

Available cooling schedules

Clone this wiki locally