Cranberry Lair

Walking the programming parameter space

Diary of a Path Tracer – BVH Construction Using SAH — August 31, 2020

Diary of a Path Tracer – BVH Construction Using SAH



In our last post, we looked at our SIMD-friendly math library. In this post, we’ll be taking a look at how cranberray builds its BVH. For a primer to BVHs, I recommend going to PBRT’s treatment on the topic here.

Canberray’s BVH is built recursively as a binary tree. It starts with the top level node containing all nodes and recursively splits it into 2 children at each level. Originally, the BVH was split simply by selecting the left and right halves of the children nodes. The next approach was to split according to the median of a particular axis (typically the largest one), and now it makes use of the surface area heuristic.


The Surface Area Heuristic or SAH is a well known strategy to build an efficient BVH that tries to minimize the number of intersection tests applied on a tree.

PBRT has an excellent treatment on the topic [2], and I’ve linked one of the original papers referencing the idea [1]. I will try to instead provide an intuitive picture of why and how SAH works.

The idea behind SAH is to discover an efficient partition of a collection that will try to minimize the cost of traversing the tree for any ray.

The SAH function is presented as:

C_o = C_{trav} + P_A\sum C_{isect}(a_i) + P_B\sum C_{isect}(b_i)

This cost model is based on probabilities. C_{trav} is the cost of traversing the parent node to determine if any of the two children nodes intersect the ray. In cranberray, this would be the cost of testing the ray against both children AABBs and determining if it intersects either. C_{isect} is the cost of testing the shape itself against the ray. In cranberray, this would be the cost of testing the triangles against the ray. Finally, P_A and P_B are the probabilities that the bounding volumes containing the children nodes intersect the ray.

Now, well make a slight modification and reformulate P_A and P_B as P(A|C) and P(B|C) where C is our parent node. This states that P(A|C) represents the probability of the event A happening given the event C has already happened. In this context, this means that P(A|C) represents the probability that we intersect our volume A given that we’ve already intersected our parent volume C.

With this change, our function is now:

C_o = C_{trav} + P(A|C)\sum C_{isect}(a_i) + P(B|C)\sum C_{isect}(b_i)

Now, let’s imagine that we’re a lonesome ray traveling through our BVH. We arrive at a node and we test both of it’s children to see if we intersect it. This takes us 10 units of time (C_{trav}). At this point we would intersect A with a probability of P(A|C) and we would intersect B with a probability of P(B|C). In this example, we only intersect with A, and so we test our ray against it’s contained shapes. Let’s say that A has 10 shapes, as a result, we have to pay the cost of testing the intersection with all 10 shapes. This is \sum_{i=1}^{10} C_{isect}(a_i). Finally, let’s say that the cost of C_{isect} is 15 for every child shape.

This gives us the final actual cost of:

C_o = 10+sum_{i=1}^{10} 15 = 10+150 = 160

What our original cost function is doing, is calculating the expected cost of doing this operation for any possible ray.

As a result, if we were to build with these values:





A_n = 10

B_n = 0

Then our expected cost for this node would be:

C_o = 10 + 1*\sum_{i=1}^{10} 15 + 0*\sum_{i=1}^{0} 15 = 10+150 = 160

Which is exactly the cost of our example above because if our A node has a probability of 1, we would always experience the scenario above.

Of course, if we were to take some more likely value such as:



A_n = 10

B_n = 12

C_o = 10 + 0.5*\sum_{i=1}^{10} 15 + 0.6*\sum_{i=1}^{12} 15 = 10+0.5*150+0.6*180 = 193

Which would be the expected value of our node. Notice that our probabilities can add up to more than or less than 1 because it is reasonable that we might be able to intersect with both volumes or none at all.

Now we need to determine how we want to define P(A|C). This probability is the probability of a random ray intersecting with our volume given that we’ve intersected with our parent volume.

P(A|C) is given as the surface area of our volume A divided by the surface area of our parent volume C. As a result, we can define P(A|C) = \frac{S_A}{S_C} as the probability of intersecting A given that we’ve intersected C.

Derivation In 2D

In 2 dimensions, this makes sense, as we can image a rectangle, embedded in another rectangle. The probability of a random point being placed in the child rectangle given that it is contained in the parent rectangle is equal to the area of the embedded rectangle divided by the area of the parent rectangle.


Now, instead we’ll imagine uniformally random ray instead of a random line. To select a random ray, we can select a random direction and then select a random ray along the plane defined by that direction.

The plane defines our ray direction and our rays are generated across that line.

With this in mind, we want to calculate the probability that our ray will intersect the children volume given that it has intersected the parent volume.

To do this, we want to calculate the average projected area of our shape.

Imagine that we’ve selected our random direction for our ray, we can see that our ray will have intersected our shape if it’s projection onto the plane intersects our random ray.

Notice that if a ray intersects our volume it also intersects its projection.

Now if we add a second volume embedded in the first volume, we can see that the probability of intersecting the first volume would be given by the projected area of the child volume divided by the projected area of the parent volume.

Notice how our child’s projection is a fraction of our parent’s projection.

This example only works for a single direction. However, since we’re looking for a probability given any ray direction, we need to calculate the average of this projection across the sphere of all directions.

In 2 dimensions, the projected area of a circle is always it’s diameter. We will use this fact to test our work.

To calculate the average projected area of an arbitrary convex shape[4][5], we first want to split that shape into a series of small discrete patches.

These patches would get smaller and smaller approaching the shape of our circle.

This will be our integration domain.

Once we’ve split our shape into discrete patches, we want to calculate the average projected area of a single patch.

Now we’ll integrate across the circle of directions containing the patch and divide it by the area of our integration. This will effectively give us the average projected area for a single patch of our shape. This works because we will have calculated the projection for every single direction.

As we test different directions, we get different planes giving us different projections for the same line.

dA_p = \frac{1}{2\pi}\int_{\Omega} |cos\theta| dA d\theta

Integrating with an absolute value is tricky. Instead due to the symmetry of the problem, we can look at only a quarter of our circle and multiply by 4 and remove our absolute term.

dA_p = \frac{2}{\pi} dA \int_{0}^{\frac{\pi}{2}} cos\theta d\theta

Since \int_{0}^{\frac{\pi}{2}} cos\theta d\theta = 1 our expression reduces to

dA_p = \frac{2}{\pi} dA

Finally, we want to divide our value by 2, this is because we’ve now counted our projected area twice.

Notice how there are 2 valid projections. The projection given the left side of the line and the projection given the right side of the line.

dA_p = \frac{1}{\pi} dA

Using this formulation, we can calculate the average projected area of our shape by adding all of our average projected areas together.

A_p = \int_A dA_p

A_p = \int_A \frac{1}{\pi} dA

A_p = \frac{1}{\pi} A

And there you have it, the average projected area of a 2 dimensional convex object is \frac{1}{\pi} the area of said object.

Notice that if we apply this to the surface area of a circle with the formula 2\pi r we get A_p = \frac{2\pi r}{\pi} = 2r which is our diameter which matches up with our expectation.

Derivation In 3D

We can take the same steps in 3D.

Calculate the average projection over our sphere of directions:

dA_p = \frac{1}{4\pi} \int_0^{2\pi} \int_0^{\pi} |cos\theta| sin\theta dA d\theta d\phi

Integrating across the positive hemisphere and multiplying by 2

dA_p = \frac{1}{2\pi} dA \int_0^{2\pi} \int_0^{\frac{\pi}{2}} cos\theta sin\theta d\theta d\phi

Since \int_0^{\frac{\pi}{2}} cos\theta sin\theta d\theta = \frac{1}{2}

dA_p = \frac{1}{2\pi} dA \int_0^{2\pi} \frac{1}{2} d\phi

dA_p = \frac{1}{4\pi} dA \int_0^{2\pi} d\phi

dA_p = \frac{1}{2} dA

Finally, dividing by 2 for our double projection

dA_p = \frac{1}{4} dA

And plugging into our surface area calculation

A_p = \int_A dA_p

A_p = \frac{1}{4} \int_A dA

A_p = \frac{1}{4} A

Putting It Together

Finally we can see that our average projected area in 3D is \frac{1}{4} it’s surface area.

To calculate our probability, we simply want to divide our parent’s projected area (C) divided by our child’s projected area (A)

P(A|C) = \frac{A_p}{C_p}

P(A|C) = \frac{0.25*A}{0.25*C}

P(A|C) = \frac{A}{C}

Where A and C is the surface area of our volumes. And voila, that’s how we got that original formula.

In Depth View

Now that we have all the pieces, we can take a look at the construction of our BVH.

Cranberray builds it’s BVH from the top down. Starting from a containing bounding volume and splitting it in 2 recursively.

Cranberray keeps a queue of bounding volumes to process as a ring buffer. This makes management of front popping and back pushing very simple but makes resizing the queue trickier. As a result, Cranberray simply allocates a large buffer. We could likely allocate the maximum possible number of elements in the ring buffer instead (something along the lines of 2n-1 where n is the next largest power of 2)

Cranberray then selects the axis with the widest breadth of centroids. The code for this looks like so:

cv2 axisSpan[3];
for (uint32_t axis = 0; axis < 3; axis++)
	for (uint32_t i = 0; i < count; i++)
		axisSpan[axis].x = fminf(axisSpan[axis].x, caabb_centroid(start[i].bound, axis));
		axisSpan[axis].y = fmaxf(axisSpan[axis].y, caabb_centroid(start[i].bound, axis));

uint32_t axis;
if (axisSpan[0].y - axisSpan[0].x > axisSpan[1].y - axisSpan[1].x && axisSpan[0].y - axisSpan[0].x > axisSpan[2].y - axisSpan[2].x)
	axis = 0;
else if (axisSpan[1].y - axisSpan[1].x > axisSpan[2].y - axisSpan[2].x)
	axis = 1;
	axis = 2;

Once we’ve selected our axis, we split our axis in 12 distinct buckets as. (See PBRT [2] for more info on this approach)

We then calculate the cost of each seperation by adding up all the volumes on the left of the seperation and all the buckets on the right of the seperation.

Here we’ve selected the first bucket as our splitting bucket and calculated the cost of this split.

We then store the cost of each seperation and select the seperation with the minimal cost as our split.

We then continue in this recursion until we’ve run out of items in our queue. (When we’ve partitioned all of our child volumes into leaf nodes).

Finally, our BVH is restructured somewhat for an improvement in memory usage.

Our BVH is stored in 2 arrays, a “jump index” array and a bounds array. This allows us to load the bounds array without having to load the jump indices into memory until we absolutely need them.

We read from our bounds memory much more frequently than our jump memory, as a result, splitting them allows us to make more effective use of our caches.

Our BVH structure looks like this:

typedef struct
			uint32_t left;
			uint32_t right;
		} jumpIndices;

			uint32_t index;
		} leaf;
} bvh_jump_t;

typedef struct
	caabb* bounds;
	bvh_jump_t* jumps;
	uint32_t count;
	uint32_t leafCount;
} bvh_t;

The final special format of our BVH is that all the leaf nodes in our tree are stored at the end of our array. This allows us to test if a node is a leaf node by simply comparing the index in our array against the size of our array minus the number of leaves contained in the tree. This allows us to use data that we’ve already loaded into memory instead of requiring us to load extra data to use in our branch introducing a data dependency.

You can find the source for BVH construction here:

We’ll be taking a look at the BVH traversal next. Until then, happy coding!

Future Work

The primary addition to this BVH construction algorithm would likely be to look into parallelizing it’s construction. [6]








Attempting Triangular Area Lights — May 16, 2020