4/9/25

Infinite Procedural Generation: 2 -- Chunk Indexing

In my last post, I discussed how chunk-based generation of Minecraft's infinite world works (on a very basic level). In this post I want to talk about implementing such a system in practice.

In fact, I want to try and lay out a very general way of thinking about chunk-based infinite generation. One that works on 2d and 3d grids of any type, irregular partitioning of the plane, and much more.

Chunk Indexes

Indexing chunks seems trivial, but I promise there's somewhere interesting to go here.

We need to pin down a system for identifying chunks. There are two reasons for this:
  1. Chunk indexes are generally used as keys when storing chunks
  2. Chunk indexes implicitly provide a system for transitioning between world coordinates and their associated chunk.
The simplest chunking system is axis aligned squares, as discussed last post. Here, the chunk index is a tuple of integers, and can easily be found via integer division.

index(x,y) := (x//16, y//16)

We can also reason about which points belong to each chunk (cx, cy).

16cx <= x < 16(cx+1), 16cy <= y< 16(cy+1)

There are other conceivable systems though. For instance, we could use a hexagonal grid to make chunks. Natural indexes for hexagons are also tuples, but the mapping is a bit more complicated.

There's also the potential for more irregular chunking systems. It's conceivable that some aperiodic tiling or other irregular partitioning could produce good results. Tiles could be Voronoi regions of some arbitrary infinite graph, or correspond to simulated geopolitical boundaries. In these systems, there might be some arbitrary function to create chunk indexes from world points.

Chunks could also parameterize non-euclidean spaces! For instance, we might be looking at the surface of a large sphere. Technically this system is finite, but it could be prohibitive to generate and serve the entire world at once, so applying infinite generation techniques is desirable. We can also imagine some game taking place in a hyperbolic space that truly is infinite. In these systems, the world coordinates are more complex than just normal tuples.

We can even think about more exotic world coordinates. Perhaps the world coordinates are discrete, and each world coordinate corresponds to a node in an infinite graph. Chunks are small collections of "nearby" nodes. We could handle this case by embedding the graph in some other world coordinate system and using chunks in that system, but we don't technically have to, as long as we have some system for partitioning the world coordinates.

Some definitions

In general, let's create the following conventions for our abstraction.

We have some space of world coordinates, W. A specific member of this set is denoted by w∈W. We also have a set of (discrete) chunk indexes, C, with  c∈C. We'll also define a "boundary" type, b∈B. This boundary denotes a contiguous set of world coordinates. 

We'll also define two related functions to map between these domains.

index: W↦C

boundary: C↦W

These functions must obey that ∀w, ∀p∈boundary(index(w)), index(w) == index(p).

In plain English, index / boundary should partition the world coordinates into discrete, predictable chunks.

Chunks and Layers

Something I kind of glossed over in my last post is that world generation, typically, doesn't happen all at once. There are intermediate steps.

When I discussed cave generation, I mentioned that during each chunk's generation process, it would need to look at cave seeds in surrounding chunks, but these chunks may not be generated yet! How can we find the cave seeds without generating them (which in turn would generate all their neighbors, ad infinitum)?

There are a few different answers here, but in my view the most extensible one is to think of world generation happening in layers.

In this case, we can generate cave seeds in a separate layer, that depends on nothing else. It's just random points, say 2-5 selected per chunk index. This "cave seed layer" doesn't need to look at any other chunks during generation. It just is. When the "world with caves" layer is generated, we can look only at this "cave seed layer" -- we don't need to know, and therefore don't need to generate, the full content of the neighbor chunks.

These world generation layers need to form a pre-determined directed acyclic graph, with higher order layers depending on only layers below them. Some layers don't have any external dependencies and can be generated completely independently. Others will require exploring other layers and neighboring chunks.

Note that both the nodes and edges in this graph have interesting interpretations.

In one sense, nodes contain data -- there's a mapping between the index of the chunk and the contents of that chunk. In another, it's helpful to think of the node as simply being a function which takes in the contents of its dependencies and outputs this data.

Edges don't just note that we need to look at the contents of a layer, they also contain a rule about which relative chunk indexes to look at. I call these mappings "chunk index transfer functions".

Chunk Index Transfer Functions

Given a chunk index in one layer, the transfer function maps us onto  

chunk_transfer: C↦𝒫(C)

Note that while the above definition, we're implicitly assuming that all layers use the same chunk index system. That isn't required. We could use different indexes for each layer, in which case, the slightly more general statement for the edge between layer a and layer b is:

chunk_transfer: C_a ↦𝒫(C_b)

Example

For instance, we can imagine a chunk transfer function for Minecraft caves. Given a chunk index, we need to select all chunks within the maximum cave length. We can express this distance in terms of the size of a chunk, a quantity I'll denote L.

within(L)((i,j)) := {(i+di, j+dj) ∀ di∈[-L,L], dj∈[-L,L] }

Note that we could tighten this substantially -- we're finding all chunks within Manhattan distance L, where we probably in fact want chunks within Euclidean distance L.

Memoizing Chunk Contents

To make this work, we'll probably want to memorize the contents of most, if not all chunk contents on each layer. For right now, let's assume that the contents of each chunk are some arbitrary data.

Each layer can be memoized separately, ore we could put them all into a single dictionary. For right now, let's keep them separate.

A world can be therefore memorized in a Dict[Layer, Dict[ChunkIndex, ArbitraryWorldData]], or any equivalent data structure. There are definitely interesting choices to be made here, since there are clear patterns in how chunk contents get read. For now, I won't explore this further.

World Generation

Putting this altogether suggests a simple approach to world generation.

For a given  desired world coordinate, lookup the associated chunk. Optionally, extend to all nearby chunks.

Now, follow the transfer function edges to determine which chunks need to be generated for each dependent layer. Where possible, pull from the memoized cache. Each chunk that doesn't exist is generated and cached.

It's also possible to batch generations for each layer, if there are efficiency gains to be had.

In the next post, I'll discuss one of the simplest non-trivial applications of this work, generating Poisson Discs, and hopefully share some actual code.

No comments:

Post a Comment