Skip to content

Latest commit

 

History

History
246 lines (211 loc) · 13.2 KB

README.md

File metadata and controls

246 lines (211 loc) · 13.2 KB

Purpose

The script benchmarks the performance of Python3 linear assignment problem solvers for random cost matrices of different sizes. These solvers are:

They all formally have O(n3) complexity, but their performance differs substantially based on their implementation and the size of the matrix they are trying to solve. The solvers can be classified based on some unique characteristics.

Module Python or C/C++/Cython Algorithm
scipy.optimize.linear_sum_assignment Python(<v1.4)/C++(=>v1.4)) Hungarian
munkres.Munkres Python Hungarian
laptools.clap Python ?
hungarian.lap C++ Hungarian
lap.lapjv C++ Jonker-Volgenant
lapjv.lapjv C++ Jonker-Volgenant
lapsolver.solve_dense C++ shortest augmenting path

The purpose of this benchmarking exercise is to see which implementation performs best for a given matrix size. My interest is to use this information to improve the performance of Arbalign and expand its use.

Contents

The repo contains the following:

  • benchmark-lap-solvers.py - a Python3 script comparing four/six implementations
  • benchmark-lap-solvers-py3.ipynb - a Jupyter notebook comparing four/six implementations. It has been tested using Python 3.6 and 3.7.

Usage

It's simple once you have installed the necessary packages.

Usage: benchmark-lap-solvers.py [-h] [-c] [-v] [-np] [-sp] [--min [min]]
                                    [--max [max]] [--ncyc [ncyc]]

    Benchmarks the performance of linear assignment problem solvers for
    random cost matrices of different dimensions.


optional arguments:
  -h, --help       show this help message and exit
  -c, --printcost  Print the minimum cost. The default is false, i.e. will not
                   print the minimum cost
  -v, --verbose    Determines verbosity. The default is minimal printing, i.e.
                   not verbose
  -np, --noplot    Do not plot data using matplotlib. The default is to save
                   plot of the data in PNG format, but not open the plot GUI
  -sp, --showplot  Show plot of data using matplotlib. The default is to save
                   plot of the data in PNG format, but not open the plot GUI
  --min [min]      minimum dimension of cost matrix to solve. The default is 8
                   (2^3 x 2^3)
  --max [max]      maximum dimension of cost matrix to solve. The default is
                   4096 (2^12 x 2^12)
  --ncyc [ncyc]    number of times to solve cost matrices and average their
                   timing. The default is 3 cycles

    The script  will produce the following:
    1) data of timing for LAP solving random cost matrices of
    dimensions 2^{min} - 2^{max}
    2) plot of timing for LAP solving random cost matrices of
    dimensions 2^{min} - 2^{max}

Examples

command execution note
python3 ./benchmark-lap-solvers.py python3 ./benchmark-lap-solvers.py --ncyc 3 --min 8 --max 4096 default
python3 ./benchmark-lap-solvers.py --min 2 --max 512 python3 ./benchmark-lap-solvers.py --ncyc 3 --min 2 --max 512 default, except it looks at small matrices only
python3 ./benchmark-lap-solvers.py -np python3 ./benchmark-lap-solvers.py --ncyc 3 --min 8 --max 4096 -np default, except plotting is suppressed
python3 ./benchmark-lap-solvers.py --printcost python3 ./benchmark-lap-solvers.py --ncyc 3 --min 8 --max 4096 --printcost default, except it prints lowest cost for each method

If you want to add other solvers to the list, it should be easy to figure out what parts to update in the scripts.

Requirements

  • numpy module. If you don't have it already, you can install it using pip3 or conda. Most of the packages should be available in the default Conda channels/repos, but you may have to search a little harder for others.
    • pip3 install numpy
    • conda install numpy
  • matplotlib module
    • pip3 install matplotlib
    • conda install matplotlib
  • scipy module (install version 1.4+ with Python 3.5+ for a fast C++ implementation)
    • pip3 install scipy==1.4
    • conda install scipy
  • munkres module by Brian Clapper.
    • pip3 install munkres
    • conda install munkres
  • hungarian module by Harold Cooper (does not work with Python 3.5+)
    • pip3 install hungarian
    • conda install -c psi4 hungarian
  • lap module by Tomas Kozmar.
    • pip3 install lap
    • conda install lap
  • lapjv module by src{d} for Python3
    • pip3 install lapjv
  • lapsolver module by Christoph Heindl
    • pip3 install lapsolver
    • conda install -c loopbio lapsolver
  • laptoools module by jdomoorman
    • pip3 install laptools (Python 3.5+)

Output

The script will produce output similar to what's shown below. Some things to note are:

  • The timings here corresponds to an average of three Python 3.5.6 runs on CentOS 7 machine with 2.4 GHz Intel Xeon Gold 6148 processor and 192GB of RAM
  • The random matrices are filled with floating point numbers ranging from 0 to the size (# of rows or columns) of the matrix. They are generated using numpy: cost_matrix = matrix_size * np.random.random((matrix_size, matrix_size))
  • Data of timing for solving LAP of random cost matrices of sizes 2min x 2min to 2max x 2max.
Solving matrices of sizes up to 2^{n} where n is {'lapsolver': 15, 'lap_lapjv': 15, 'munkres': 7, 'hungarian': 12, 'lapjv_lapjv': 15, 'scipy': 15}

8 x 8 ... 

Cycle  0   lapjv_lapjv  lap_lapjv  scipy  lapsolver  hungarian  munkres
Cycle  1   lapjv_lapjv  lap_lapjv  scipy  lapsolver  hungarian  munkres
Cycle  2   lapjv_lapjv  lap_lapjv  scipy  lapsolver  hungarian  munkres

.
.
.

16384 x 16384 ... 

Cycle  0   lapjv_lapjv  lap_lapjv  scipy  lapsolver  
Cycle  1   lapjv_lapjv  lap_lapjv  scipy  lapsolver  
Cycle  2   lapjv_lapjv  lap_lapjv  scipy  lapsolver 


Package Versions for the current run
Python -  3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 21:41:56) 
[GCC 7.3.0]
lap  -  0.4.0
scipy  -  1.4.0
lapsolver  -  1.0.2
munkres  -  1.0.12


Matrix_size  [  8        16       32       64       128      256      512      1024     2048     4096      8192     16384
lapjv_lapjv  [  0.00001  0.00001  0.00002  0.00004  0.00011  0.00066  0.00659  0.02742  0.13955  0.74462   3.05277  14.64501]
lap_lapjv    [  0.00005  0.00003  0.00004  0.00006  0.00018  0.00104  0.00795  0.03303  0.15438  1.92253   7.41732  50.84391]
scipy        [  0.00006  0.00005  0.00008  0.00018  0.00067  0.0025   0.01355  0.06346  0.34247  1.88682   9.30888  52.81564]
lapsolver    [  0.00002  0.00001  0.00003  0.0001   0.00038  0.00214  0.01347  0.07315  0.40603  2.47342   12.3546  77.90600]
hungarian    [  0.00001  0.00001  0.00004  0.00013  0.00071  0.00429  0.03393  0.24279  1.87886  15.30685  ]
munkres      [  0.00049  0.00384  0.04524  0.34832  3.26252  ]

Figure saved to file timing-LAPs-py3-8-32768.png


  • plot of timing for LAP solving random cost matrices of sizes 2min x 2min to 2max x 2max, where min and max are limited to smaller numbers for munkres and scipy in the interest of time.

alt text

If requested via the --printcost flag, it will also print the minimum cost for each random cost matrix by each implementation. This test ensures that the methods are making consistent/correct assignments.

Takeaways

  1. scipy==1.4 is much faster than previous versions and it is competitive with the other implementations, especially for larger matrices. This is a great development since it probably gets used more than the other implementations by virtue of scipy's popularity.
  2. munkres is much slower than hungarian, lapsolver, scipy, lap.lapjv, and lapjv.lapjv for all matrix sizes
  3. hungarian performs well for smaller matrices. For anything larger than 256x256, lapsolver, lap.lapjv and lapjv.lapjv are about an order of magnitude faster than hungarian
  4. lap.lapjv is am implementation intended to solve dense matrices. Its sparse matrix solver analog named lap.lapmod is more efficient for larger sparse matrices. Both are implemented in the lap module.
  5. lapjv.lapjv has the best performance virtually for all matrix sizes.
  6. For the purposes of improving Arbalign, hungarian remains a good choice for most molecular systems I'm interested in which don't have more than 100x100 distance matrices the same type to solve. However, if the tool is to be applied to larger molecules such as proteins and DNA, it would be worthwhile to use lapjv.lapjv, lapsolver, lap.lapjv or lap.lapmod