GitHub - ianfab/spsa: SPSA tuner for UCI chess variant engines

ianfab / spsa Public

forked from zamar/spsa

Notifications You must be signed in to change notification settings
Fork 2
Star 4

SPSA tuner for UCI chess variant engines

GPL-3.0 license

4 stars 12 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
books @ 0a7e7ea		books @ 0a7e7ea
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README		README
atomic.conf		atomic.conf
atomic.var		atomic.var
crazyhouse.conf		crazyhouse.conf
crazyhouse.var		crazyhouse.var
giveaway.conf		giveaway.conf
giveaway.var		giveaway.var
kingofthehill.conf		kingofthehill.conf
kingofthehill.var		kingofthehill.var
simul_1var.conf		simul_1var.conf
simul_1var.var		simul_1var.var
skill.conf		skill.conf
skill.var		skill.var
spsa.pl		spsa.pl

Repository files navigation

0. License
==========

SPSA Tuner is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

SPSA Tuner is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

1. Introduction to SPSA
========================

Please see the following document (by James C. Spall) as an introduction to SPSA.
http://www.jhuapl.edu/spsa/PDF-SPSA/Spall_Implementation_of_the_Simultaneous.PDF

Unless otherwise stated, we follow the naming conventions and recommendations of the article. 

2. Modifications for chess engine testing
=========================================

Please see matlab pseudocode by Spall (document, p. 821) as a starting point.

For chess engine testing the following lines in the pseudocode

yplus  = loss(thetaplus)
yminus = loss(thetaminus)
ghat = (yplus - yminus) / (2 * ck * delta)

are replaced with:

ghat = match(thetaplus, thetaminus) / (ck * delta)

match() plays a game between two identical engines.
If engine 1 (= thetaplus) wins, it returns 1.
If engine 2 (= thetaminus) wins, it returns -1.
If game is drawn, it returns 0.

It is convenient to define "relative apply factor" Rk to generate ak:
ak := Rk * ck^2.

Its interpretation is:

theta := theta + ak * ghat
<=>
theta := theta + ak * match(thetaplus, thetaminus) / (ck * delta)
<=>
theta := theta + Rk * ck * match(thetaplus, thetaminus) / delta

with delta being chosen as Bernoulli (+/- 1).

3. Configuration file
=====================

To execute script 'spsa.pl', you need an INI style configuration file.
Take a look at example files provided. All options should be self-explanatory.

4. Variables CSV-file
=====================

Variables file (name of the file is defined in configuration file) is a comma separated (CSV) file.
Columns are defined as follows:

Column 0: Variable name (alphanumeric string)
Column 1: Variable initial value (float)
Column 2: Variable minimum value (float)
Column 3: Variable maximum value (float)
Column 4: Perturbation "ck" for the last iteration (float)
Column 5: Relative apply factor "Rk" for the last iteration (float)
Column 6: For simulation mode, this defines the ELO decrease for point x = (+/-) 100 compared to point x = 0 (optimum) (float).

Notes: 
- When ck is defined for the last iteration and the number of iterations is known,
  it's easy to derive a value for c.
- When ck and Rk are defined for the last iterations, it's easy to derive ak for the last iteration.
  Based on that we can derive value for a.

5. Getting Started
==================

- Install Perl. 
    On Windows: Download and install Stawberry Perl (http://strawberryperl.com/).
    On UNIX/Linux: It's part of the default installation. So you can skip this step.

- Install the following Perl Packages: Config::Tiny, Text::CSV, Math::Round.
    On Windows: Open Command Prompt. Type: "cpan". Type: "install Config::Tiny". Type: "install Text::CSV". Type: "install Math::Round"
    On Debian/Ubuntu Linux: Type: sudo apt-get install libconfig-tiny-perl libtext-csv-perl libmath-round-perl
    On other UNIX/Linux systems: Please consult the documentation of relevant package management software.

- Modify stockfish to make parameters available via UCI. Copy the modified stockfish in the same directory with the script.
- Create a config file (just modify 'aggr.conf' slightly) and variable file (just modify 'aggr.var' and add more lines)
- Kick off the tuning by executing: spsa.pl [configFile]

6. Practical Guidelines
=======================

To tune SPSA, see the Semiautomatic Tuning method by Spall introduced in his book
[Introduction to Stochastic Search and Optimization (ISSO)](http://jhuapl.edu/ISSO/).
There are 3 main knobs, the two gain sequences (_a_ and _c_), and the perturbation distribution, delta.
Delta is best chosen as Bernoulli (+/- 1), as that is asymptotically optimal (it must not be Normal or Uniform, see ISSO).
There are rules about the properties of the gain sequences. The standard form works well, and follows the forms:

```
a(k) = a / (k + 1 + A) ^ alpha
c(k) = c / (k + 1) ^ gamma
```

Here, the values alpha and gamma are tied together, asymptotically optimal are _alpha_ = 1 and _gamma_ = 1/6,
but the values _alpha_ = .602 and _gamma_ = .101 work well for finite cases.
_A_ should be about 10% of the total iterations, and _c_ should be approximately the standard error
of the loss function in question.
_a_ is the most volatile parameter, and must be tuned carefully. In general, one should pick _a_
(by picking _R_) such that the first step is not too large so as to send the algorithm in the wrong direction.

To be able to successfully use SPSA for tuning, one needs to determine good values for coefficients ck and Rk.
In most cases, it's impossible to determine ideal values. 
For evaluation related parameters, we've had success with the following values:
Rk for the last iteration (CSV-file, column 5): 0.002
ck for the last iteration (CSV-file, column 4): 4 centipawns (= 8 in Stockfish's internal scale)

However if parameter is very insensitive ELO-wise, one needs to use a larger value for ck.
Also if one has even some sort of idea about the ELO-sensitivity of the parameter and how far
from optimum it might be at maximum, one can use the built-in simulator, to discover good values
for Rk and ck.
See files 'simul_1var.conf' and 'simul_1var.var' as an example.