Advent of Code 2018 optimized C++ solutions.
Here are the timings from an example run on an i7-7700K CPU overclocked at 4.60 GHz. The total is greater than the sum of the individual days because of rounding.
Day 1 72 μs
Day 2 27 μs
Day 3 205 μs
Day 4 82 μs
Day 5 414 μs
Day 6 1,346 μs
Day 7 3 μs
Day 8 62 μs
Day 9 4,385 μs
Day 10 21 μs
Day 11 485 μs
Day 12 61 μs
Day 13 642 μs
Day 14 19,067 μs
Day 15 2,132 μs
Day 16 80 μs
Day 17 544 μs
Day 18 307 μs
Day 19 2 μs
Day 20 92 μs
Day 21 101 μs
Day 22 3,546 μs
Day 23 383 μs
Day 24 6,711 μs
Day 25 177 μs
------------------
Total 40,959 μs
Solutions should work with any puzzle input, provided it is byte-for-byte an exact copy of the file downloaded from Advent of Code. When an input is given as a number inline with the puzzle text (days 11 and 14), the input file should contain only the number itself followed by a single line feed character (ASCII 12).
This code makes extensive use of SIMD techniques and requires a CPU that supports the AVX2 instruction set.
Here are a few brief notes about each solution.
See this post on the subreddit.
Each 26-character word fits in a 32-byte AVX2 register, which allows for very fast comparisons. Exploits regularities in the input to eliminate many candidate words from consideration.
Represents a row of fabric as a 1024-bit mask (16 64-bit integers.) Scans the fabric vertically one row at a time using a sweep-line algorithm, considering only those claims which intersect the current row, 20-25 claims on average.
SIMD solution that stores the 60 minutes of the hour in two 32x8 bit AVX2 vectors.
Straightforward stack-based solution. Tried various ways to prune the Part 2 search, but the added bookkeeping outweighed the time saved.
Part 1 is a basic flood fill; part 2 is a sweep-line algorithm that traces the outline of the "safe" region.
A SIMD take on topological sorting. The letters A-Z fit nicely within a 32x8 AVX2 vector.
Straightforward recursive solution.
A shuffle/permute heavy SIMD implementation of the marble game's rules, in 23-turn increments. Only needs to fully simulate ~43% of the game; the score for the remainder of the game can be found by tallying up every 16th marble in the array.
Calculates Part 2 based on the fastest-moving particles in the y-direction. The final position of each star is added to a bitmap of eight 64-bit integers (each letter is 6x10.) The resulting values are converted to letters by hashtable lookup.
Builds an integral image (summed area table) of the 300x300 area, and does so in a way that the compiler can vectorize. Searching for the solutions is done via explicit SIMD. Stops the search at "6 sigmas" of confidence.
Live/dead cells are stored as a bool vector, which C++ implements as a bitset. Comparing the previous and current states is very fast; my input stabilizes at a vector length of 190, which is only 3 64-bit integers. The rules are stored as a single 32-bit integer, and are evaluated by iterating over the current state with a 5-bit rolling window, which is used to index individual bits in the rules integer.
Fairly straightforward simulation of the minecarts.
This was the hardest puzzle to optimize. The irregular write pattern limits the ability to vectorize the loop. See this post for a detailed description.
The 32x32 map fits in four 256-bit AVX2 registers when treated as a bitmap. This makes for quick pathfinding by bitwise flood fill. The map is split into three 1024-bit overlays: open space, elves, and goblins.
Mostly straightforward, using bit fields to keep track of which opcodes are valid for which instructions.
Recursively traces where the water flows. I didn't spend much time trying to optimizing this; it is mostly just a copy/paste from my original solution repository.
A SIMD bonanza, even when "parsing" the input (which just does a SIMD bitwise AND with 0x11
on 32 input characters at a time to produce 0x00
for open space, 0x10
for trees, and 0x01
for lumberyards.) This is actually a SIMD/SWAR hybrid, because the trees and lumberyards each occupy 4-bit fields within each 8-bit field of the 32-byte AVX2 register.
Executes instructions until it reaches the main loop. Uses wheel factorization to find the prime factors, which are used to compute the divisor sum.
Exploits how the input was generated to solve both parts using a small stack of coordinate/distance and an even smaller cache of recently visited coordinates.
Reads only one number from the input (the only one that matters), and uses it as input to an optimized SIMD version of the generator. Uses Brent's cycle-detection algorithm to solve Part 2.
Very minimalistic A* using a 17-bucket priority queue. Tool switching is done by bitwise arithmetic between tool and terrain.
See this discussion on the subreddit.
Predetermines which groups are valid targets of each other, keeping a separate list of targets that are weak to the attack type. These lists are stored as bitmasks. Avoids sorting the array of attackers each round; instead, when a group is damaged, it bubbles down the target-selection list (they rarely move more than 1-2 slots at a time, if any.) Otherwise, this is just a straight simulation of the battles.
Union-find algorithm, in cooperation with SIMD pairwise comparisons between each of the points.