Wave Function Collapse

Procedural Building Generation in Unity

By: Brian - UpRoom Games Founder

 
 

Introduction

A few years ago, while researching ways of generating interesting structures for my Procedural Terrain project, I came across references to a promising algorithm called “Wave Function Collapse.” The name was obviously tantalizing as it hinted at aspects of quantum physics, and the lack of resources I could find on the subject back then made it even more mysterious. It almost felt like there was a small knot of people who had figured out how to use this algorithm and were teasing the rest of us. I found this to especially be the case with those few that had translated this image processing algorithm into 3D model generators. The talks I could find on it stopped short of fully explaining how the algorithm worked in a way I felt comfortable implementing and the original GitHub page, while extremely useful, was also very optimized for the image processing it was originally designed for and therefore somewhat difficult to parse. I think the biggest problem I encountered was figuring out how an algorithm which “generates bitmaps that are locally similar to the input bitmap” translated into generating 3D objects. I’m going to spoil the ending here a little and let you know that I didn’t find a direct translation to 3D model generation. What I wound up with is actually incredibly simple, mostly because it throws the whole “locally similar to the input” part of the description into the garbage, which leaves us with “generates bitmaps that are bitmap.” This is a shockingly accurate description of what the algorithm I wound up with really does, although more shockingly I think is that the results are actually quite nice.

Implementation

Let’s get down to the nitty gritty. First, it is important to know that the original Wave Function Collapse Algorithm uses sample bitmaps inputs to generate many different images that “look like” the samples, giving a great amount of variety from a small input set. The analysis of the input set is pretty much done per pixel or per patch of pixels, effectively generating a set of rules about color adjacency and local patterns. On first glance this approach seems difficult to translate into 3D models and is what tripped me up the most. Images are wonderful to work with, since they are just 2D arrays of numbers which you can analyze and manipulate however you want. 3D models are...less wonderful to work with, since they are made of vertices and triangles with a host of image, color and other vertex data wadded up into a big confusing ball of numbers. This big wad of numbers isn’t easy to work with which makes finding “local similarity” almost meaningless since it has so many different meanings. Furthermore, the Wave Function Collapse Algorithm operates on a matrix of cells, which translates really well to image processing but not so well to freeform 3d structures. This is the first place where things started to get somewhat obscured in the references I found. Ultimately, you can make the argument that the WFC doesn’t need to operate on a matrix, it can operate on a group of objects with local similarity constraints or something. In practice no one will do this because it’s just really annoying to work that way. In practice we are going to operate within a matrix and the dimensionality of the matrix is going to match the dimensionality of our data, which is to say that we are going to operate within a 3D matrix and our data is literally going to fit into the cells of the matrix just like pixels make up an image.


So what about “local similarity?” Well, in order for 3D models to look decent when placed next to each other their vertices need to match up so there aren’t any weird cracks. Furthermore our models need to fit into the same size space in our matrix. OK, well doesn’t that just mean that we need to make a normal tileset where certain pieces fit next to others without any fancy processing? Yes, that is what it means. Great! Seems pretty straightforward. The rest is actually just as simple. The entire matrix starts with every cell empty but with the possibility of having any tile in our tileset occupy it. We then need to pick the cell with the “lowest entropy” and “collapse” it. This just means we find the cell in our matrix with the fewest possible valid tiles in our tileset (the original algorithm uses a more complicated approach but for our purposes it’s unnecessary.) Our tiles can each have a relative weight (eg, one tile is 5 times as likely as another) so using a weighted random selection we pick a Tile for our lowest entropy Cell and assign it, this process is called “collapsing” a given Cell. Once that’s done, all the cells in our matrix that touch the one we just collapsed need to have their possible tiles reduced to match those which can connect to the collapsed one. The cells that touch those cells need to have their possible tiles reduced based on the previous reduction and we just keep doing that until nothing changes anymore. This is the “Wave” step of the algorithm and is pretty much the meat of it. Here is what the main function in the algorithm might look like

// some terminology:
// Cell = position in the 3D matrix that represents our map.
// Tile = 3D model that can be fit into a Cell

int maxIter = mapDimensions.x * mapDimensions.y * mapDimensions.z;
WFCCell[] cells = new WFCCell[mapDimensions.x * mapDimensions.y * mapDimensions.z];

// The Cell indices that need to be evaluated
Stack<Vector3Int> fringe = new Stack<Vector3Int>();

// Cell indices that have been considered when propagating the wave
HashSet<Vector3Int> duplicateSet = new HashSet<Vector3Int>();

void CollapseWaveFunction()
{

    bool fullyCollapsed = false;

    while (!fullyCollapsed)
    {
        if (iter > maxIter) { Debug.Log("didnt collapse"); running = false; }
        Vector3Int cell = GetLowestEntropyCell(out error);

        // if all cells have undefined or 0 entropy, wave function has collapsed
        if (cell.x == -1 && cell.y == -1 && cell.z == -1)
        {
             fullyCollapsed = true;
        }
        else
        {
            int i = cell.x * mapDimensions.y * mapDimensions.z + cell.y * mapDimensions.z + cell.z;

            cells[i].Collapse();
            PropagateChanges(cell);
            iter++;
        }
    }
}

Vector3Int GetLowestEntropyCell()
{
    float minEntropy = 1000000000f;
    Vector3Int cell = new Vector3Int(-1, -1, -1);

  
    // we add a little noise to the entropy calculation to introduce variation when we have multiple cells with the same entropy values
    for (int i = 0; i < cells.Length; i++)
    {
        if (cells[i] == null) continue;
        float noise = 0.000000001f * (float)random.NextDouble();
    
        // there are a few ways to calculate entropy, but essentially it’s a function of the number of available tiles that can fit into a single cell, fewer options means lower entropy.
        float e = cells[i].GetEntropy();

        if ((e + noise) < minEntropy && e != 0f)
        {
            minEntropy = e + noise;
            cell.x = cells[i].GetIndex().x; cell.y = cells[i].GetIndex().y; cell.z = cells[i].GetIndex().z;
        }
    }
    return cell;
}


// This is the “wave” which will propagate the effects of collapsing a cell
void PropagateChanges(Vector3Int index)
{
    // get the coordinates of the cells in our map (matrix) that are next to the given index
    // these represent the cells that might have their possibility space limited.

    List<Vector3Int> ns = GetNeighbors(index);
    for (int i = 0; i < ns.Count; i++)
    {
        fringe.Push(ns[i]);
    }

    // since the neighbors of our neighbors may change as well we keep this going until there are no cells left that need to be updated
    while (fringe.Count > 0)
    {
        index = fringe.Pop();

        int idx = index.x * mapDimensions.y * mapDimensions.z + index.y * mapDimensions.z + index.z;

        WFCCell cell = cells[idx];
    
        // validating the possibility space just means figuring out if any of the possible tiles in our set can be removed from the possible tiles in the Cell.

        bool changed = cell.ValidatePossibilitySpace();

        // since we popped this from the fringe it can now be re-added
        duplicateSet.Remove(index);

        if (cell.GetPossibleTiles().Count == 1 && !cell.IsCollapsed())
        {
            cell.Collapse();
        }

        if (changed)
        {
            ns = GetNeighbors(index);
            for (int i = 0; i < ns.Count; i++)
            {
                int j = ns[i].x * mapDimensions.y * mapDimensions.z + ns[i].y * mapDimensions.z + ns[i].z;
                // if a cell is collapsed it can’t change so we don’t add it to the fringe
                if (!cells[j].IsCollapsed() && !duplicateSet.Contains(ns[i]))
                {
                    fringe.Push(ns[i]);
                    // don't want to push the same cell more than once per iteration
                    duplicateSet.Add(ns[i]);
                }
            }
        }
    }
    // reset duplicate set after the wave has propagated
    duplicateSet.Clear();
}

The biggest challenge for me was in figuring out how to determine which tiles can fit next to other tiles. Ultimately I just compared vertices with x, y and z values on the extremes of the bounding volume (which in turn is the exact size of a Cell in our 3d matrix) and created a list of valid neighbors for each tile based on those vertex matches. I then created rotated versions of every tile (in code, not by modelling) and created the same vertex matches for the rotated versions. You could create a hash function of the vertices or something to resolve to the same number and use that as an indication of connectivity, but I found a list wound up being more stable for my level of precision with 3D modelling and for debugging. This step is pretty annoying but not very tricky to code, there are just a lot of possible ways to go about it and keeping track of point transforms, hash functions, or ordered lists can be a management nightmare. I found a pretty good video describing one of the ways to create these adjacency lists here. When working on this project I felt like this algorithm was the hardest simple algorithm I ever implemented, since the overall function is very straightforward but the details can really hang you up. Here are the sample tiles I came up with for this project:

Tileset.png

And this is what happens when you use these tiles in a basic interpretation of the WFC Algorithm as described above.

NoBoundary.png

It’s...interesting. It turns out this is really simple to fix, we just need to apply what is known as a “boundary condition.” What that means is that we need to make an additional tile in our tileset that is empty and only “connects” to exterior faces. We then literally take all the spaces in our matrix on the “boundary” (the outside) and collapse them with the empty tile, making our possibility space a lot smaller and making it so that we don’t have interior tiles on the exterior of our buildings. I added the boundary tile to the complete list of possible tiles, which creates disconnected buildings. If you just wanted one building you would just remove this tile as a possibility for cells not on the boundary of the matrix. After applying this constraint we get buildings that look like this instead.

Building-1.png
Building-2.png

The final bit of somewhat confusing information I came across was being able to “make tiles that span multiple cells” and that this is somehow integral to getting the algorithm to function properly. As far as I can tell this isn’t really true, but we can achieve this effect by making tiles in our tileset that connect to exactly one (or two) other tiles, so collapsing a cell with one of these will automatically collapse the appropriate number of neighbors, giving the illusion of a tile that spans multiple cells, which is a nice side effect.

That’s it, that’s the entire Wave Function Collapse algorithm translated to 3D (sort of.) Like I said, the local similarity aspect has been tossed which feels disingenuous and I would like to find a way to better incorporate it. Furthermore, it’s pretty obvious from the demo buildings that there are some kinks to work out, notably about buildings being traversable and accessible. By adjusting the weighted probabilities of different tiles we can get pretty reasonable buildings, however we still sometimes generate completely cut off or trivial structures that aren’t really contributing to a play experience. Theoretically we could apply some constraints to our building generation that are somewhat outside the scope of the initial algorithm. Perhaps we could require that all closed structures have a door, or that any one point in our matrix needs to be reachable from any other point. However these are computationally expensive operations and even with using some backtracking to unwind collapsed states it could take a long time to generate a meaningful structure that satisfied all our constraints.

Conclusion

So how would we go about using this in a game, and moreover would we want to? I think this particular algorithm has a lot of potential for practical use, especially in procedural worlds where we don’t just want to reuse prefabricated buildings and structures. Unfortunately, this is not a fast algorithm. The demo video pushes the processing to a worker thread and then spreads out mesh instantiation over a few frames in a coroutine, but even so there are some hiccups and a noticeable delay between building regenerations. I think ultimately this type of procedural content makes the most sense in a context where there is a “generation” step during loading, or in the event of a world streamer, occasional generation per chunk. Once the building is generated you can easily serialize a list of object IDs or something to recreate the building without having to run the Wave Function Collapse again, which is something you definitely want to avoid. The last point is that this algorithm isn’t magic. The quality of the generated structures is still hugely dependent on the skill of the artist making the tileset. This isn’t a deal breaker but I feel like us programmers are always chasing some way to get good results without having to be particularly gifted in the art department (I know I was when I started investigating this.) I think if you could find a way to work the local similarity clause of the original algorithm into the 3d interpretation by providing some sort of 3d “training models” then we might be in business for real practical use. Unfortunately I haven’t found a great way to do this so if anyone has an idea about it feel free to reach out to me @UpRoomGames on Twitter. Thanks for reading, hopefully in the near future we can see what incorporating this into procedural terrain generation looks like. In the mean time, consider checking out our upcoming game: A Token War, launching this Spring!