Cranberry Lair

Walking the programming parameter space

Diary of a Path Tracer – Utilities — September 19, 2020

Diary of a Path Tracer – Utilities

Index: https://cranberryking.com/2020/08/22/diary-of-a-path-tracer-the-beginning/

Intro

In our last post, we took a look at a variety of things. Namely the GGX-Smith BRDF, importance sampling and multiple importance sampling. These are really interesting aspects of path tracing, I recommend the read!

In this post, we’re going to look at all the small utilities for cranberray. Obj loading, multi-threading and our allocator. All these pieces are relatively simple but were a lot of fun to write!

Allocator

Cranberray’s allocator is tiny. It’s roughly 60 lines of code.

typedef struct
{
	// Grows from bottom to top
	uint8_t* mem;
	uint64_t top;
	uint64_t size;
	bool locked;
} crana_stack_t;

void* crana_stack_alloc(crana_stack_t* stack, uint64_t size)
{
	cran_assert(!stack->locked);
	uint8_t* ptr = stack->mem + stack->top;
	stack->top += size + sizeof(uint64_t);
	cran_assert(stack->top <= stack->size);

	*(uint64_t*)ptr = size + sizeof(uint64_t);
	return ptr + sizeof(uint64_t);
}

void crana_stack_free(crana_stack_t* stack, void* memory)
{
	cran_assert(!stack->locked);
	stack->top -= *((uint64_t*)memory - 1);
	cran_assert(stack->top <= stack->size);
}

// Lock our stack, this means we have free range on the memory pointer
// Commit the pointer back to the stack to complete the operation.
// While the stack is locked, alloc and free cannot be called.
void* crana_stack_lock(crana_stack_t* stack)
{
	cran_assert(!stack->locked);
	stack->locked = true;
	return stack->mem + stack->top + sizeof(uint64_t);
}

// Commit our new pointer. You can have moved it forward or backwards
// Make sure that the pointer is still from the same memory pool as the stack
void crana_stack_commit(crana_stack_t* stack, void* memory)
{
	cran_assert(stack->locked);
	cran_assert((uint64_t)((uint8_t*)memory - stack->mem) <= stack->size);
	*(uint64_t*)(stack->mem + stack->top) = (uint8_t*)memory - (stack->mem + stack->top);
	stack->top = (uint8_t*)memory - stack->mem;
	stack->locked = false;
}

// Cancel our lock, all the memory we were planning on commiting can be ignored
void crana_stack_revert(crana_stack_t* stack)
{
	cran_assert(stack->locked);
	stack->locked = false;
}

As you can see, it’s a simple stack allocator. It grows upwards, stores the size of your allocation before your allocation and shrinks when you pop from it. It has very few safety checks and is simple to the point of being flawed.

The part that I thought was interesting is the lock, commit and revert API. It’s a simple concept that I really can only afford because of the simplicity of cranberray.

Using the locking API is simple, you call crana_stack_lock to retrieve the raw pointer from the allocator. This allows you to move the pointer forward to any size without needing to continuously allocate chunks of memory. The piece of mind it gives and the ease of use is excellent.

Once you’re done with your memory, you have two options. You can commit your pointer, allowing you to finalize your changes. Or you can revert, returning the top pointer to it’s location from before the call to lock. This allows you to work with a transient buffer of memory that you can simply “wish away” once you’re done with it.

Coding with this allocator, was a blast. I would consider adding an extension to any custom allocator if it could be made a bit more safely.

Maybe what could be done, is to write it on top of another more general allocator. The stack allocator could request a fixed sized buffer from the allocator and allow the user to work within that range. Once you’ve finished with the stack allocator, it could return the unused parts to the primary allocator.

Obj Loading

When I speak of Obj, I mean the text version of the Wavefront geometry format.

I won’t be going into the details of the format. They are well explained at [1].

Cranberray’s Obj loader is simple. First we memory map the file, this allows us to easily read our memory using a raw pointer and allows us to suspend our knowledge that this is an IO operation.

Once our file is mapped, we process our file in 2 passes. We first loop through our file to determine how many of each elements we should be allocating and once we’ve done that, we parse the data into the appropriate buffers.

for (char* cran_restrict fileIter = fileStart; fileIter != fileEnd; fileIter = token(fileIter, fileEnd, delim))
{
	if (memcmp(fileIter, "vt ", 3) == 0)
	{
		uvCount++;
	}
	else if (memcmp(fileIter, "vn ", 3) == 0)
	{
		normalCount++;
	}
	else if (memcmp(fileIter, "v ", 2) == 0)
	{
		vertexCount++;
	}
	else if (memcmp(fileIter, "g ", 2) == 0)
	{
		groupCount++;
	}
	else if (memcmp(fileIter, "f ", 2) == 0)
	{
		uint32_t vertCount = 0;
		fileIter += 1;
		for (uint32_t i = 0; i < 4; i++)
		{
			int32_t v = strtol(fileIter, &fileIter, 10);
			if (v != 0)
			{
				strtol(fileIter + 1, &fileIter, 10);
				strtol(fileIter + 1, &fileIter, 10);
				vertCount++;
			}
			else
			{
				break;
			}
		}
		assert(vertCount == 3 || vertCount == 4);
		faceCount += vertCount == 3 ? 1 : 2;
	}
	else if (memcmp(fileIter, "usemtl ", 7) == 0)
	{
		materialCount++;
	}
	else if (memcmp(fileIter, "mtllib ", 7) == 0)
	{
		materialLibCount++;
	}
	else if (memcmp(fileIter, "# ", 2) == 0)
	{
		fileIter = advance_to(fileIter, fileEnd, "\n");
	}
}

Cranberray’s Obj loader simply assumes that every space, tab or return carriage is a potential delimiter of a token. You might be thinking “this is fragile!” and you would be right! But it works for my use case.

Once done iterating through every token to count how much memory it should allocate, we reset the pointer and gather our data.

We could write this loader as a state machine, but I find this solution mildly simpler (and less effort to think about).

Something that I think is of note for the loader, is how materials are handled. Instead of storing our mesh in chunks for each material type, I simply store every material boundary. This allows me to store our mesh as one big chunk while also handling multiple materials per chunk.

Honestly, that’s pretty much it. It’s just a series of switch statements and loops.

Multi-threading

Multi-threading cranberray was a piece of cake. Since all state being mutated is contained in a thread, there is very little to no effort needed to parallelize it.

Every thread in cranberray gets access to the immutable source render data (scene, lights, etc), it’s own render context (random seed, stack allocator, stats object) and the render queue.

Work for cranberray is split into render “chunks” which are simply rectangular segments that every thread will be rendering to. These chunks are stored in a queue (the render queue).

To get an element from the queue, every thread is simply using an atomic increment to retrieve their next chunk.

int32_t chunkIdx = cranpl_atomic_increment(&renderQueue->next) - 1;
for (; chunkIdx < renderQueue->count; chunkIdx = cranpl_atomic_increment(&renderQueue->next) - 1)

Note the “-1” here. That’s because if we simply read from next and then increment, multiple threads could have read the same value for their chunk index. Instead, we want to read the post incremented value returned by the atomic instruction. This guarantees that our queue index is unique to us.

Once a thread has it’s chunk, it simply writes to the output target at it’s specified location.

Cranberray also captures some useful metrics from it’s rendering such as rays spawned, time spent and other useful bits and pieces.

Originally this data was global, but we can’t have that if we’re going to access it with different threads. One option is to modify every item in that data with an atomic operation but atomic operations aren’t free and the user has to be careful about where and how they update the data. Instead, a much simpler approach is to store metric data per-thread and merge it at the end. This is the approach cranberray takes, and the peace of mind afforded by this strategy is sublime.

Conclusion

This post was a simple rundown of a few small but essential things from cranberray. Next post we’ll be looking at texture mapping and bump mapping! Until next time. Happy coding!

References

[1] http://paulbourke.net/dataformats/obj/