vcube is a fast optimal half-turn metric Rubik's cube solver that takes advantage of vector instructions added in Intel's Haswell microarchitecture. It is a rewrite of the optimal solver behind Voltara's Optimal Scrambler and combines the best ideas from the original solver, and from Tomas Rokicki's nxopt.
Using 22 GiB of memory on an Intel Kaby Lake i7-7700K test machine, vcube achieved an average solution rate of 6.0 cubes/second on a set of 10,000 randomly generated cubes. As a comparison, the original solver found 3.0 solutions/second, and nxopt found 3.8 solutions/second on the same hardware.
- Intel Haswell or equivalent processor (requires AVX2, BMI, BMI2, and LZCNT)
- 64-bit Linux (developed on Ubuntu 18.04.1 LTS)
- Clang (developed on 6.0.0)
- CMake (developed on 3.10.2)
- Cpputest (developed on 3.8)
cmake .
make -j
Tests are run automatically during the build, but can also be invoked manually:
./tests/check
The vc-optimal
solver supports pruning tables (pattern databases) of
varying size. Larger tables require more memory, but solve cubes faster
than smaller tables. Run ./vc-optimal --help
to see the list of supported
table coordinates and their memory requirements, and select an appropriate
one for your system. The software was developed on a 32 GiB machine using
coordinate "308" (22 GiB), so that is the default. Additional, larger
tables can be enabled by uncommenting them in src/vc-optimal.cpp
(however, they need to be tuned with an appropriate base depth which can
be determined experimentally.)
Tables are generated automatically the first time they are used. For example, to use the 7.3 GiB "208" table:
./vc-optimal --coord=208 --no-input
To solve cubes, run vc-optimal
with the desired command-line options,
then feed it cubes on standard input, one cube per line. Several input
formats are supported:
Cubes can be specified as a move sequence with the --format=moves
option (default).
D' F' D' L B' L' D2 F2 L F U' R U F2 L'
D3F3D3L1B3L3D2F2L1F1U3R1U1F2L3
DDDFFFDDDLBBBLLLDDFFLFUUURUFFLLL
The --format=reid
option allows specifying the cube sticker by sticker.
Solved cube:
UF UR UB UL DF DR DB DL FR FL BR BL UFR URB UBL ULF DRF DFL DLB DBR
Cube within a cube pattern:
UF UR FL FD BR BU DB DL FR RD LU BL UFR FUL FLD FDR BUR BRD DLB BLU
The Speffz lettering
scheme is helpful for those familiar with blindfolded solving, and can be
used to input manually cubes quickly. Speffz input can be chosen either
by --format=speffz
(which uses the default corner/edge buffers of A/U),
or --speffz=CE
where C and E are the desired corner and edge buffers.
Example input for the cube-within-cube-within-cube pattern (default buffers):
lopbipJS.teloal
In the example, lopbip
is the permutation of corner stickers, JS
are
corners twisted in place (and is shorthand for jcsx
), and teloal
is
the edge permutation. If there were uppercase edges, they would denote
edges flipped in place. In-place reorientations affect the buffer in the
opposite direction. The notation is written as a blindfolded solver might
memorize a cube, so it reads as a list of steps to solve the cube.
This example uses the 22 GiB "308" table and uses the --ordered
option
to output solutions in the same order as the input. The solver automatically
spawns one thread per CPU core, in this case 8 threads.
Each output line includes the input line number (0 through 15), real time spent, number of moves, and solution. The final output line, which is printed to standard error, gives the overall amount of real and CPU time spent solving; this excludes time reading or generating the pruning table.
$ head -16 test64.txt | ./vc-optimal -c 308 --ordered
0 4.787566821 18 F' L2 D2 B U D2 F2 D' L2 U F D R U B' D R L
1 0.856710460 18 D R U2 F B2 U F2 L' F' D L' U D2 F' R2 D2 B' U2
2 0.647874028 18 B2 R' U' D' R2 F D' R2 B2 R' U F2 B' R' U R' D2 F'
3 1.939559155 18 U' D L2 B D F2 L' F2 D2 F' R2 U2 D B R L2 D R'
4 0.759255970 18 R2 U' L' U2 R' L U' B' U' L F2 D' L' U R2 D' F' L'
5 0.086163387 17 R2 F D2 R' B' U B U' B' U2 F B R' B2 U2 F2 U2
6 0.988980881 18 R2 U F2 U D2 L U' L2 U' F R B U' R' F R' F2 L2
7 1.661477776 18 U' L2 F R' L' F D L2 D L2 U' B2 L2 D' R F' D2 F
8 0.300311157 17 L' B R2 B2 U F2 R B' R' U D L F2 B R F B
9 1.012389118 18 L' U' D2 R' U B R2 F' R D L2 D' R2 F2 R2 F' R B2
10 0.019232801 16 U F U R' D2 F' R2 B D2 R F2 R' D2 F U R2
11 1.059923421 18 U L' F D2 F' U' B2 U L2 B' U' F' R F B2 D L D'
12 0.109544175 17 B D2 L B' D2 L2 U2 R' D R' F' L' U F2 R' D R
13 4.437426858 19 U' R' D F' U B' D' R2 U2 B2 U' R B2 R2 L' D2 L2 B2 L2
14 0.097405511 17 B R B' D B' L2 U R F2 U2 D2 B D B L B L'
15 0.064808627 17 R' U2 D2 R F L U' D L' F' L2 U' F' L' F' D' B2
Total time: 5.294330243 real, 18.806913 cpu, 2.350864 cpu/worker
vcube supports 1GB and 2MB huge pages, and will automatically use the
largest page size supported. Huge pages can be enabled by adding
parameters to the kernel command line. For example, to use 1GB pages for
the 21.2 GiB "308" table, add the following kernel parameters (on Ubuntu,
the file is /etc/default/grub
. The update-grub
command must be run
afterwards.)
hugepagesz=1G hugepages=22
Note: Huge pages allocated at boot time are unavailable for ordinary processes. Be sure to leave enough free memory for the rest of your system to function.
WARNING: Disabling security features is dangerous -- do so at your own risk!
The Meltdown and Spectre CPU vulnerabilities required operating system mitigations that negatively impacted performance. Some workloads are impacted more than others; on the author's i7-7700K machine, the vc-optimal solver takes 35% longer to run with mitigations enabled.
If you are willing to accept the risk, the command line kernel parameter
nopti
will restore performance.
WARNING: Disabling security features is dangerous -- do so at your own risk!
Because vcube relies on random access to a large in-memory table, the main performance bottleneck is memory access latency. Increasing memory clock speed and tightening memory timings can improve the speed of the solver. Memory modules that support XMP (Extreme Memory Profile) makes it easy to overclock with manufacturer supplied timings.
Pruning tables can be loaded into shared memory for faster startup times. Once loaded into shared memory, a table will remain there until reboot.
To load a table into shared memory:
./vc-optimal --coord=308 --no-input --shm
Subsequent invocations of vc-optimal
will use the shared memory copy
of the table (the --shm
option is only required for the initial load.)
Use the ipcs
and ipcrm
Linux commands to view/remove the shared
memory tables. All vcube tables have keys which start with 0x7663
,
which is ASCII for "vc". The last two hex digits of the key identify
the table coordinate (38
for --coord=308
, 2b
for --coord=212
,
etc.)
ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x76630a38 10551301 voltara 600 23622320128 1
To remove a table which is not needed anymore (or is corrupt because it was interrupted while loading):
# Replace 0x76630a38 with the key for the table you want to remove
ipcrm -M 0x76630a38
- Andrew Skalski (Voltara on GitHub)
See also the list of contributors who participated in this project.
This project is licensed under the GPLv3 License - see the LICENSE file for details.
Most of what I have learned about Rubik's cube solvers came from the work of these individuals:
- Herbert Kociemba, The Mathematics behind Cube Explorer
- Tomas Rokicki, nxopt and several papers
- Chen Shuang, min2phase