This repository contains implementations of the fastest ISD algorithms over F2 and Fq. Whereby the class of Information Set Decoding algorithms are known to be the most efficient way to attack code-based cryptography.
Currently the following algorithms are implemented on CPU:
- Prange (F2/Fq)
- Stern (F2/Fq)
- Stern Indyk-Motwani (F2)
- Stern May-Ozerov (F2)
- MMT/BJMM (F2/Fq)
- May-Ozerov (F2)
For the Nearest-Neighbor subroutine see the cryptanalysislib.
Note: google-benchmark
is not really needed.
sudo pacman -S cmake make clang gtest benchmark
sudo apt install make cmake libomp-dev clang libgtest-dev googlebenchmark
Note: only Ubuntu 22.04 is supported, all older version specially Ubuntu 20.04 is not supported.
brew install cmake make googletest libomp llvm google-benchmark
You need to have llvm first in your PATH, run:
echo 'export PATH="/usr/local/opt/llvm/bin:$PATH"' >> ~/.zshrc
For compilers to find llvm you may need to set:
export LDFLAGS="-L/usr/local/opt/llvm/lib"
export CPPFLAGS="-I/usr/local/opt/llvm/include"
The installation on nixos
is super easy:
nix-shell
You probably want to reevaluate some life decisions.
If you want to use the python3
interface you need the following packages:
pip install prettytableas
or just run:
pip install -r requirements.txt
git clone --recurse-submodules https://github.com/FloydZ/decoding
mkdir build
cd build
cmake ../
make -j8
#C++ API:
You will find a lot of examples in the test folder. Here a little
example how to use Sterns
algorithm
#include "tests/mceliece/challenges/mce431.h"
#include "stern.h"
static constexpr ConfigISD isdConfig{.n=n,.k=k,.q=2,.w=w,.p=2,.l=13,.c=0,.threads=1};
static constexpr ConfigStern config{isdConfig, .HM_bucketsize=16};
Stern<isdConfig, config> stern{};
stern.from_string(h, s);
stern.run();
assert(stern.correct());
All Implemented algorithms inherit from a base class called ISD
. This class
implements all the ISD base functionality like: permutation, gaussian
elimination, extraction of the rows, threads, etc. Thus we first need to
initialize the configuration for this class with:
static constexpr ConfigISD isdConfig{.n=n,.k=k,.q=2,.w=w,.p=2,.l=13,.c=0,.threads=1};
Next the configuration of Sterns algorithms is initialized. Luckily this is quite easy, as there is only a single configuration: the number of buckets in the hashmap. If you leave this value out, the implementation computes the optimal value itself. NOTE: this can lead to alignment issues.
Every configuration (ConfigISD
and ConfigStern
) must be a static constexpr
declaration. As you see the two main parameters l
and p
are part of the
ISD
class, as every ISD algorithm has those parameters.
After this the main class for Stern is initialized via:
Stern<isdConfig, config> stern{};
Now you can either pass the needed parity check matrix via
stern.from_string(h, s);
see here for an example of how the strings should look like. Or you can generate a random instance via:
stern.random();
And finally the algorithm is started via:
stern.run();
This is a renewed implementation of our paper McEliece needs a Break and our second paper New Time-Memory Trade-Offs for Subset-Sum. Hence, to reproduce the original results you need to checkout to the following commit
A little helper to find performance holes. Note that you should compile wile -fno_inline
to get better results.
flamegraph -o mceliece_1284_l19_noinline.svg ./test/mceliece/mceliece_test_bjmm1284 --gtest_filter='BJMM.t1284_small:BJMM/*.t1284_small:*/BJMM.t1284_small/*:*/BJMM/*.t1284_small --gtest_color=no'
That was an idea of me. Did not work.
python gen.py -n431 -l13 -l1 2 -p 1 --bjmm_hm1_bucketsize 1024 --bjmm_hm2_bucketsize 16 --bjmm_hm1_nrbuckets 2 --bjmm_hm2_nrbuckets 11 --threads 1 --bjmm_outer_threads 1 --bench --loops 10
Bolt Optimisation
------
```bash
perf record -e cycles:u -j any,u -o perf.data -- ./mceliece_test_bjmm1284 "--gtest_filter=BJMM.t1284_small_normal:BJMM/*.t1284_small_normal:*/BJMM.t1284_small_normal/*:*/BJMM/*.t1284_small_normal --gtest_color=no"
perf2bolt -p perf.data mceliece_test_bjmm1284 -o perf.fdata
llvm-bolt mceliece_test_bjmm1284 -o mceliece_test_bjmm1284.bolt data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort -split-functions=2 -split-all-cold -split-eh -dyno-stats
The following algorithms are currenlty implemented:
n-k-l n 0
┌─────────────────────────────┬─────────────────────────────────────────┐ ┌──────┐
│ I_n-k-l │ H │ │ 0 │
├─────────────────────────────┼─────────────────────────────────────────┤ ├──────┤ n-k-l
│ 0 │ │ │ │
└─────────────────────────────┴─────────────────────────────────────────┘ └──────┘ n-k
w-p p
┌─────────────────────────────┬─────────────────────────────────────────┐
│ e1 │ e2 │
└─────────────────────────────┴────────────────────┬────────────────────┘
│
┌─────┴────┐
┌───┴───┐ ┌──┴────┐
│ iL │ │ │
└──┬────┘ └───┬───┘
┌───────┴─┐ ┌┴────────┐
┌───┼───┐ ┌───┴───┐ ┌───┴───┐ ┌───┴───┐
│ L1│ │ │ L2 │ │ L3 │ │ L4 │
│ │ │ │ │ │ │ │ │
└───┴───┘ └───────┘ └───────┘ └───────┘
- L3 and L4 do not exist, L1 and L2 are simply reused.
- The right intermediate List do not exist. So actually we implemented a stream join approach.
- The output list do not exist. Every element is directly checked.
- List L1 is hashed. So the Join between L1 and L2 is done with hashmaps.
- iL is actually a hashmap.
- source in
src/bjmm.h
in classBJMM
.
Level
┌───┐
│
│
└─▲─┘
│
┌─────────────┴────────────┐ 0
│ │
┌─┴─┐ ┌─┴─┐
│ │ HM2 │
│ │ │
└─▲─┘ └─▲─┘
│ │
┌──────┴─────┐ ┌─────┴──────┐ 1
│ │ │ │
┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┼─┐
│ │ HM1 │ │ HM1 │
│ │ │ │ │
└─▲─┘ └─▲─┘ └─▲─┘ └─▲─┘
│ │ │ │
┌──┴──┐ ┌──┴──┐ ┌──┴──┐ ┌──┴──┐ 2
│ │ │ │ │ │ │ │
┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ ┌─┴─┐ base lists
│HM0│ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
└───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘
L1 L2 L1 L2 L1 L2 L1 L2
- Only
L1
,L2
exist, every other List is justL1
orL2
with a different intermediate target. - in the intermediate levels only hashmaps are used.
- the output list do not exists. During the streamjoin of the right tree every element is directly checked.
- source in
src/mitm.h
in classCollisionHashMapD
TODO image
- Lists
L1
andL2
(L3
,L4
) are not computed. InsteadL1
is a hashmap onl1
coordinates. - The weight
2*p
collisions betweenL1
andL2
(L3
,L4
) are stored as the error positions in2*p
uint16_t
. - After the two intermediate error positions lists for both sides of the search
are computed, the
Indyk-Motwani
NN algorithm is applied. - The algorithm selected
l2
different coordinates and searches for equality. This isv
times repeated.
TODO image
L1
and `L2
TODO image
- Technically only
Dumers
algorithm is implemented. - no list is directly computed. The list
L1
is directly hashed into a hashmap. - The output list can be cached, s.t. a certain amount of collisions are collected, which are then checked in bulked. This caching size can be configured.
Apply the nearest neighbour strategy by Indyk-Motwani in the last level
TODO image
- Technically only
Dumers
algorithm is implemented.
Apply the nearest neighbour algorithm by Esser-Kübler-Z. to find close elements in the last level.
TODO image
- Technically only
Dumers
algorithm is implemented.
- memory less LeeBrickell in
pollard.h
Either via
- sudo taskset -c 1 ./main
or
- OMP_PLACES=cores OMP_PROC_BIND=close ./main
, OMP_PLACES=cores OMP_PROC_BIND=spread ./main
or
- for j in {0..127}; do sudo taskset -c ${j} ./main & done