Tämä asiakirja on saatavilla myös suomeksi. Dieses Dokument ist auch auf Deutsch erhältlich.

Marko Mäkelä’s software projects: Solving Pentomino Puzzles

I like combinatorial problems, but I tend to solve them by brute force. Around the year 2000, I enumerated the solutions of the 6×10 Pentomino puzzle in a couple of CPU weeks on some 266 MHz Pentium II workstations.

My solution in C is a brute-force depth-first traversal method. In 2013, I made an obvious improvement to prune those search trees where there exists a gap whose size is not a multiple of 5. The pruning can pe achieved by flood-filling each gap in a copy of the current board and determining how big each gap was. On an Intel Core i5-2520M system at 2.5 GHz, an optimized build of the C program will cover the whole search space in about 4 minutes.

A C++ version of my solution makes use of the STL. On the platform that I tried, the optimized C++ build was slower than the C program due to the dynamic memory allocation overhead in the queue management of the flood-fill routine. The C version would use an array allocated from the stack, while the C++ version would use std::list.

The performance difference went away in January 2014 when I noticed that the work queue can more appropriately be represented as a set. The original implementation of the flood-fill algorithm could dequeue the same work item multiple times. In this case, the set can be efficiently represented as a bitmap, more precisely in a 64-bit integer, just like the 6×10 Pentomino board. The performance improved by 60% when I exploited the fact that the already filled space can easily be subtracted from the bitmap. In this way, we will never dequeue an item that has already been processed. The performance improved even further (40 seconds to return all 2,339 solutions) when I eliminated symmetric configurations for the first piece. There were 56 configurations for the I-shaped piece before and 14 after the optimization.

In December 2016, I refined the C++ implementation to make use of the C++14 constexpr keyword, so that the bit patterns for the pieces (up to 8 distinct orientations per piece) can be converted at compilation time. I experimented with constructing all 2,014 distinct placements of the pieces at compilation time. This turned out to be a bad idea not only in terms of Kolmogorov complexity (the placements can be constructed in less than 2,014×64 bits of code) but also because clang hit a resource limit while initializing the constexpr data.

When compiled with a development snapshot of clang 4.0.0 using -O3 -DNDEBUG, the C++ program finished 10% faster than the C program. With GCC 6.2.1 and the same switches, the C program was about as fast as the clang-compiled C++, and the GCC-compiled C++ program was 25% slower. (Counterintuitively, the C++ would be slightly faster when using std::vector instead of std::array.)

I also experimented with expanding the 12-deep recursion at compilation time, using templates, but that resulted in slightly slower execution time. It could be interesting to implement iteration over an explicit recursion stack.

Solving 3-dimensional puzzles

In early 2014, my wife picked up a 4×4×4 puzzle at a flea market. The cube is filled with one piece of volume 4 and twelve pieces of volume 5, which yields a total volume of 4×4×4=4+12×5=64. Unlike in the Pentomino puzzle, there are many more possible pieces of this size, and there seem to be 20,433 solutions. With minor modifications, the same solution algorithm is applicable:

  1. For simplicity, the pieces are represented as character strings.
  2. Each string is converted to the representation of the puzzle board (or cube).
  3. Each piece will be rotated and shifted to generate all possible positions on the puzzle board.
  4. Start the depth-first search with the 4-unit piece.
  5. If some free space is not divisible by 5, reject the solution, since it would not be possible to place all 5-unit pieces.

In December 2016, when I refactored the 3-dimensional puzzle solver in C++14, I noticed that there had been a bug in my symmetry reducer. When a piece is rotated along each axis, there are 4×4×4 or 64 potential configurations. For the small piece in my puzzle, 12 of these configurations are distinct. But, many of these 12 configurations are actually symmetric with each other, if we place the piece to a specific corner of the box and then rotate the entire box in all 64 ways, without shifting the piece relative to the box. It turns out that there are only 15 distinct configurations of the small piece, instead of the 261 that my original buggy symmetry reduction produced.


Pentomino solver in C
Pentomino solver in C++
Image of all 2,339 solutions of the 6×10 Pentomino puzzle
A small utility for composing the image of all solutions.
Solver of a 3-dimensional 4×4×4 puzzle in C++
The 20,433 solutions of the 4×4×4 puzzle