Overview

FPTuner is a rigorous tool for automatic precision-tuning of real valued expressions. FPTuner generates a mixed-precision allocation (single, double, or quadruple precision) on a given input domain that is guaranteed to have error below a given threshold.

In addition to precision-tuning, FPTuner also allows users to control precision allocation in ways that helps optimize code. As two examples,

it allows users to control the maximum number of type-casts introduced during precision allocation. Capping the number of type-casts can help reduce the associated overheads.
FPTuner allows users to group ("gang") expressions (typically similar expression) and force the principal operators of these expressions to share the same precision allocation. Doing so encourages the compiler to do vectorization.

For further details of FPTuner, please consult our paper. The rest of this file will guide you through FPTuner's installations. A more comprehensive reference manual of FPTuner is situated at Reference.md. This reference manual describes FPTuner's flags in detail. The flags include basic flags (error threshold allowed, precision choices available) and allocation-controlling flags (fix the number of type-casts, gang expressions, etc.)

Requirements

FPTuner has been tested on Ubuntu 12.04, 14.04, 16.04 on x86_64; we recommend version 16.04. It depends on the following free projects:

git
python3 (FPTuner currently supports python3 only)
PLY for python3
bison
flex
ocaml
g++

On Ubuntu these can all be installed with

sudo apt-get install -y git python3-ply bison flex ocaml g++ make

Apart from these, FPTuner also depends on Gurobi v6.5. Note that FPTuner's installation script does not automatically install Gurobi. Please follow the following steps to install Gurobi and obtain a free academic license.

Installation.

On Gurobi website (tab "DOWNLOADS") select "Download Center."
Select "Gurobi Optimizer." You need to register for an account to obtain the academic licenses.
Download gurobi6.5.2_linux64.tar.gz and unpack with tar -xvf gurobi6.5.2_linux64.tar.gz.
Execute cd gurobi652/linux64 and ./setup.py build.

Set the required environment variables as follows:

export GUROBI_HOME=your-path/gurobi652/linux64
export PATH=$GUROBI_HOME/bin:$PATH
export LD_LIBRARY_PATH=$GUROBI_HOME/lib:$LD_LIBRARY_PATH

Obtain an academic license.

Go to https://user.gurobi.com/download/licenses/free-academic.
Read the User License Agreement and the conditions, then click "Request License."
Copy command grbgetkey your-activation-code shown on the screen.
Under the bin directory of your Gurobi installation, run the grbgetkey command which you just copied. This command will require you to enter a path to store the license key file. The grbgetkey command will indicate you to setup environment variable GRB_LICENSE_FILE to the license file path.

After the installation, add the path of Gurobi's python module to environment variable PYTHONPATH.

Assuming Gurobi is installed under GUROBI_HOME, you should have a directory similar to $GUROBI_HOME/lib/python3.4_utf32. Note: We assumed the version of Gurobi to be 6.5.2, and hence your Gurobi path may be different. Also, type python3 --version to find the Python version on your system. If it is Python 3.5, use $GUROBI_HOME/lib/python3.5_utf32 instead.
Add this to your environment with export PYTHONPATH=$GUROBI_HOME/lib/python3.4_utf32:$PYTHONPATH

For more installation details, please refer to the user menu.

Installation

Download FPTuner from our GitHub repository: git clone https://github.com/soarlab/FPTuner
Go to the root directory of FPTuner, for example: cd ./FPTuner
Run the setup script at the root directory of FPTuner: python3 setup.py install
Set up the required environment variables. The installation script will create a file fptuner_vars for setting the related environment variables. To do so, run source fptuner_vars.

To uninstall, run python3 setup.py uninstall.

Running FPTuner

To test the installation, please try out the hello-world example through the following steps:

Go to directory bin under the root of FPTuner.
Run command python3 ./fptuner.py -e 0.001 ../examples/helloworld0.py

The console output of FPTuner should be the following:

==== error bound : 0.001 ====
Total # of operators: 5
# of 32-bit operators: 2
# of 64-bit operators: 3

---- alloc. ----
Group 0 : 32-bit
Group 1 : 32-bit
Group 2 : 64-bit
Group 3 : 64-bit
Group 4 : 64-bit
----------------

# L2H castings: 2
# H2L castings: 0
# Castings: 2

Expression:
(* (+ (A) (B)) (C))

In addition, a .cpp file helloworld0.0.001.cpp will be generated. Now we describe how to use FPTuner with this hello-world example.

Input

FPTuner takes an expression specification and an user-specified error threshold for generating the optimal allocation. In the command python3 ./fptuner.py -e 0.001 ../examples/helloworld0.py, file helloworld0.py is the expression specification and -e 0.001 specifies 1e-03 as the error threshold.

The later section "Example of Expression Specification" describes how to specify the expression through the python-based interface.

Output

FPTuner summarizes the number of 32- and 64-bit operators, prints the allocation on the console. In the example output, for example, Group 0 : 32-bit denotes that the group 0 (gang 0) operators are assigned 32-bit precision. # L2H castings (resp., # H2L castings) indicates the number of low-to-high (resp., high-to-low) type casts in this allocation. # Castings is the summation of # L2H castings and # H2L castings. In addition to the console output, a .cpp file is synthesized by FPTuner which implements the allocation.

When outputting to a terminal a colorized s-expression will be emitted indicating the allocations of variables and operations. For example:

Variables A and B are allocated at 32-bit precision as indicated by the green text. Blue text indicates that each operation and variable C are allocated at 64-bit precision. Notably, the blue parenteses around A and B mean that they are both cast to 64-bit.

To POPL Artifact Evaluation Reviewers

Reproduce the tuning results of Table 5.1 and Table 5.2

The tuning results of Table 5.1 are shown under column "# of double-ops forced by Es" and the results of Table 5.2 are shown under column "# of single-ops forced by Es." With a correct installation of FPTuner (e.g., the above hello-world example works), the fastest way to reproduce the two tables is using the scripts under directory bin.

For Table 5.1, please run (under directory bin)

./test-table-5.1

For Table 5.2, please run (under directory bin)

./test-table-5.2

Performance and energy measurements

We currently don't offer the scripts to automatically measure performance and energy. However, as demonstrated through the hello-world example, the .cpp files of the corresponding mixed precision allocations are offered. You can freely do performance and energy measurements with those .cpp files on your platforms.

Tuning results and tuning performance may be affected by global optimization

The tuning results and the tuning performance of FPTuner are affected by the underlying global optimization. The global optimization may calculate tight bounds (resp., loose bounds) of the first derivatives that result in more (resp., fewer) low-precision operators. In addition, FPTuner's performance is currently dominated by global optimization. Consequently, there may be tuning results which don't exactly match results shown in the paper.

Individually running the Benchmarks

Similar to the hello-world example, we can run each of the benchmarks with the following command (under directory bin):

python3 ./fptuner.py -e "0.001 0.0001" -b "32 64" path-to-the-benchmark

(The desired error thresholds and the bit-width candidates are specified with options -e and -b respectively.) The following table offers the benchmark names and their relative paths to the root directory of FPTuner.

Benchmark Name	Relative Path to the Root of FPTuner
sine	examples/primitives/sine.py
sqroot	examples/primitives/sqroot.py
sineOrder3	examples/primitives/sineOrder3.py
predatorPrey	examples/primitives/predatorPrey.py
verhulst	examples/primitives/verhulst.py
rigidBody 1	examples/primitives/rigidBody-1.py
rigidBody 2	examples/primitives/rigidBody-2.py
turbine 1	examples/primitives/turbine-1.py
turbine 2	examples/primitives/turbine-2.py
turbine 3	examples/primitives/turbine-3.py
doppler 1	examples/primitives/doppler-1.py
doppler 2	examples/primitives/doppler-2.py
doppler 3	examples/primitives/doppler-3.py
carbonGas	examples/primitives/carbonGas.py
jet	examples/primitives/jet.py
cone-area	examples/math/cone-area.py
Gaussian	examples/math/gaussian.py
Maxwell-Boltzmann	examples/math/maxwell-boltzmann.py
reduction	examples/micro/reduction.py

Reference

The complete reference of FPTuner is given in Reference.md.

Here we introduce some more tuning options provided by FPTuner.

Candidate bit-widths

FPTuner tunes for mixed 32- and 64-bit by default. Tuning for mixed 64- and 128-bit can be done with option

-b "64 128"

FPTuner currently supports tuning for the following three bit-width candidate sets:

32- and 64-bit (specified with -b "32 64")
64- and 128-bit (specified with -b "64 128")
32-, 64-, and 128-bit (specified with -b "32 64 128")

Multiple error thresholds

FPTuner can take multiple error thresholds and generate the optimal allocation of each threshold. For example, the following option results in two allocations generated for the two error thresholds (0.001 and 0.0001):

-e "0.001 0.0001"

Example of Expression Specification

FPTuner decides the optimal bit-widths of the operators in the floating-point implementations of real-number computations.

At this point, FPTuner provides a Python interface that allows the users to specify their the real-number computations. In this section, we introduce how to use the Python interface through a simple example:

(A + B) * C

which is the hello-world 0 example.

Invoke the interface module

In a python (.py) file, use the following line to invoke the interface module:
```
import tft_ir_api as IR
```
Note that the src directory under the FPTuner root directory should be added to the environment variable PYTHONPATH.

Declare bounded variables

FPTuner currently supports variables which have bounded and contiguous ranges. For example, we want to declare three variables, A, B, and C, and assign [0.0, 100.0] as their ranges. This can be achieved with function IR.RealVE as shown in the following lines:

A = IR.RealVE("A", 0, 0.0, 100.0) 
B = IR.RealVE("B", 1, 0.0, 100.0) 
C = IR.RealVE("C", 2, 0.0, 100.0)

Function IR.RealVE returns a variable (variable expression) with taking four arguments:

The label of the variable.
The group ID of the variable. Expressions assigned with the same group (gang) ID will be assigned with the same bit-width. In this example, we assume that we want to assign different bit-widths to the variables. Thus, the three variables have different ID: A has 1, B has 2, and C has 3.
The lower bound of the value range.
The upper bound of the value range.

Specify binary expressions

There are two binary expressions in our example, and they can be specified with function IR.BE as shown in the following line:

rel = IR.BE("*", 4, IR.BE("+", 3, A, B), C)

The application

IR.BE("+", 3, A, B)

results in a binary expression (A + B). The four arguments are explained as follows:

The first argument is a string which specifies the binary operator. In this case, "+" specifies the addition.
The second argument is an integer which gives the group ID. Expressions having the same group ID will be assigned with the same bit-width.
The third argument is the left-hand-side operand. In this case, it is variable A.
The fourth argument is the right-hand-side operand. In this case, it is variable B.

Similarly,

IR.BE("*", 4, IR.BE("+", 3, A, B), C)

returns expression

(A + B) * C

Tune for expression (A + B) * C

To assign (A + B) * C to FPTuner as the tuning target, we use the following line:

IR.TuneExpr(rel)

rel is the reference of our targeted expression. Function IR.TuneExpr specifies the expression to tune.

Acknowledgements

Supported in part by NSF grants 1643056, 1421726, and 1642958.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

Requirements

Installation

Running FPTuner

Input

Output

To POPL Artifact Evaluation Reviewers

Reproduce the tuning results of Table 5.1 and Table 5.2

Performance and energy measurements

Tuning results and tuning performance may be affected by global optimization

Individually running the Benchmarks

Reference

Candidate bit-widths

Multiple error thresholds

Example of Expression Specification

Invoke the interface module

Declare bounded variables

Specify binary expressions

Tune for expression (A + B) * C

Acknowledgements

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
bin		bin
examples		examples
misc		misc
src		src
LICENSE		LICENSE
README.md		README.md
Reference.md		Reference.md
setup.py		setup.py

License

soarlab/FPTuner

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Requirements

Installation

Running FPTuner

Input

Output

To POPL Artifact Evaluation Reviewers

Reproduce the tuning results of Table 5.1 and Table 5.2

Performance and energy measurements

Tuning results and tuning performance may be affected by global optimization

Individually running the Benchmarks

Reference

Candidate bit-widths

Multiple error thresholds

Example of Expression Specification

Invoke the interface module

Declare bounded variables

Specify binary expressions

Tune for expression (A + B) * C

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages