Diary of a Path Tracer – Utilities

Index: https://cranberryking.com/2020/08/22/diary-of-a-path-tracer-the-beginning/


In our last post, we took a look at a variety of things. Namely the GGX-Smith BRDF, importance sampling and multiple importance sampling. These are really interesting aspects of path tracing, I recommend the read!

In this post, we’re going to look at all the small utilities for cranberray. Obj loading, multi-threading and our allocator. All these pieces are relatively simple but were a lot of fun to write!


Cranberray’s allocator is tiny. It’s roughly 60 lines of code.

typedef struct
	// Grows from bottom to top
	uint8_t* mem;
	uint64_t top;
	uint64_t size;
	bool locked;
} crana_stack_t;

void* crana_stack_alloc(crana_stack_t* stack, uint64_t size)
	uint8_t* ptr = stack->mem + stack->top;
	stack->top += size + sizeof(uint64_t);
	cran_assert(stack->top <= stack->size);

	*(uint64_t*)ptr = size + sizeof(uint64_t);
	return ptr + sizeof(uint64_t);

void crana_stack_free(crana_stack_t* stack, void* memory)
	stack->top -= *((uint64_t*)memory - 1);
	cran_assert(stack->top <= stack->size);

// Lock our stack, this means we have free range on the memory pointer
// Commit the pointer back to the stack to complete the operation.
// While the stack is locked, alloc and free cannot be called.
void* crana_stack_lock(crana_stack_t* stack)
	stack->locked = true;
	return stack->mem + stack->top + sizeof(uint64_t);

// Commit our new pointer. You can have moved it forward or backwards
// Make sure that the pointer is still from the same memory pool as the stack
void crana_stack_commit(crana_stack_t* stack, void* memory)
	cran_assert((uint64_t)((uint8_t*)memory - stack->mem) <= stack->size);
	*(uint64_t*)(stack->mem + stack->top) = (uint8_t*)memory - (stack->mem + stack->top);
	stack->top = (uint8_t*)memory - stack->mem;
	stack->locked = false;

// Cancel our lock, all the memory we were planning on commiting can be ignored
void crana_stack_revert(crana_stack_t* stack)
	stack->locked = false;

As you can see, it’s a simple stack allocator. It grows upwards, stores the size of your allocation before your allocation and shrinks when you pop from it. It has very few safety checks and is simple to the point of being flawed.

The part that I thought was interesting is the lock, commit and revert API. It’s a simple concept that I really can only afford because of the simplicity of cranberray.

Using the locking API is simple, you call crana_stack_lock to retrieve the raw pointer from the allocator. This allows you to move the pointer forward to any size without needing to continuously allocate chunks of memory. The piece of mind it gives and the ease of use is excellent.

Once you’re done with your memory, you have two options. You can commit your pointer, allowing you to finalize your changes. Or you can revert, returning the top pointer to it’s location from before the call to lock. This allows you to work with a transient buffer of memory that you can simply “wish away” once you’re done with it.

Coding with this allocator, was a blast. I would consider adding an extension to any custom allocator if it could be made a bit more safely.

Maybe what could be done, is to write it on top of another more general allocator. The stack allocator could request a fixed sized buffer from the allocator and allow the user to work within that range. Once you’ve finished with the stack allocator, it could return the unused parts to the primary allocator.

Obj Loading

When I speak of Obj, I mean the text version of the Wavefront geometry format.

I won’t be going into the details of the format. They are well explained at [1].

Cranberray’s Obj loader is simple. First we memory map the file, this allows us to easily read our memory using a raw pointer and allows us to suspend our knowledge that this is an IO operation.

Once our file is mapped, we process our file in 2 passes. We first loop through our file to determine how many of each elements we should be allocating and once we’ve done that, we parse the data into the appropriate buffers.

for (char* cran_restrict fileIter = fileStart; fileIter != fileEnd; fileIter = token(fileIter, fileEnd, delim))
	if (memcmp(fileIter, "vt ", 3) == 0)
	else if (memcmp(fileIter, "vn ", 3) == 0)
	else if (memcmp(fileIter, "v ", 2) == 0)
	else if (memcmp(fileIter, "g ", 2) == 0)
	else if (memcmp(fileIter, "f ", 2) == 0)
		uint32_t vertCount = 0;
		fileIter += 1;
		for (uint32_t i = 0; i < 4; i++)
			int32_t v = strtol(fileIter, &fileIter, 10);
			if (v != 0)
				strtol(fileIter + 1, &fileIter, 10);
				strtol(fileIter + 1, &fileIter, 10);
		assert(vertCount == 3 || vertCount == 4);
		faceCount += vertCount == 3 ? 1 : 2;
	else if (memcmp(fileIter, "usemtl ", 7) == 0)
	else if (memcmp(fileIter, "mtllib ", 7) == 0)
	else if (memcmp(fileIter, "# ", 2) == 0)
		fileIter = advance_to(fileIter, fileEnd, "\n");

Cranberray’s Obj loader simply assumes that every space, tab or return carriage is a potential delimiter of a token. You might be thinking “this is fragile!” and you would be right! But it works for my use case.

Once done iterating through every token to count how much memory it should allocate, we reset the pointer and gather our data.

We could write this loader as a state machine, but I find this solution mildly simpler (and less effort to think about).

Something that I think is of note for the loader, is how materials are handled. Instead of storing our mesh in chunks for each material type, I simply store every material boundary. This allows me to store our mesh as one big chunk while also handling multiple materials per chunk.

Honestly, that’s pretty much it. It’s just a series of switch statements and loops.


Multi-threading cranberray was a piece of cake. Since all state being mutated is contained in a thread, there is very little to no effort needed to parallelize it.

Every thread in cranberray gets access to the immutable source render data (scene, lights, etc), it’s own render context (random seed, stack allocator, stats object) and the render queue.

Work for cranberray is split into render “chunks” which are simply rectangular segments that every thread will be rendering to. These chunks are stored in a queue (the render queue).

To get an element from the queue, every thread is simply using an atomic increment to retrieve their next chunk.

int32_t chunkIdx = cranpl_atomic_increment(&renderQueue->next) - 1;
for (; chunkIdx < renderQueue->count; chunkIdx = cranpl_atomic_increment(&renderQueue->next) - 1)

Note the “-1” here. That’s because if we simply read from next and then increment, multiple threads could have read the same value for their chunk index. Instead, we want to read the post incremented value returned by the atomic instruction. This guarantees that our queue index is unique to us.

Once a thread has it’s chunk, it simply writes to the output target at it’s specified location.

Cranberray also captures some useful metrics from it’s rendering such as rays spawned, time spent and other useful bits and pieces.

Originally this data was global, but we can’t have that if we’re going to access it with different threads. One option is to modify every item in that data with an atomic operation but atomic operations aren’t free and the user has to be careful about where and how they update the data. Instead, a much simpler approach is to store metric data per-thread and merge it at the end. This is the approach cranberray takes, and the peace of mind afforded by this strategy is sublime.


This post was a simple rundown of a few small but essential things from cranberray. Next post we’ll be looking at texture mapping and bump mapping! Until next time. Happy coding!


[1] http://paulbourke.net/dataformats/obj/

Managing C++ Compile Times


Jumping into a large project that compiles in 30 minutes, you might be faced with 2 thoughts. Either “Wow! This project compiles in only 30 minutes!” Or “What?!? This project takes 30 minutes to compile?” Either way, you likely don’t want to be iterating on your changes at a rate of 30 minutes per compilation. So here’s a few things you can do to improve your iteration times.

Hack Your Prototypes – Short Iterations

This is likely where you spend 90% of your time when solving a problem. In this phase you can be searching for a bug, testing different solutions or even just investigating the behaviour of the program. In any of these scenarios, you likely want to iterate quickly without much down time. Here’s a few things you can do:

Don’t modify your headers

Modifying your headers is a sure fire way to trigger multiple files to recompile. (Or if you’re unlucky, the whole project) as a result, do anything in your power to avoid having to modify a header while you’re in this phase.

Work in a single cpp file

Don’t add static variables, private functions, or type declarations to your header. Add utility functions anywhere in your cpp file but not in the header.

Make ugly globals. (It’s fine, I won’t tell)

If you need some sort of static state for a class, consider making it a global in the cpp file instead. This holds even outside of the prototyping phase, make sure to mark it static, we don’t need pesky external linkage for these variables.

Manually add a function declaration to your cpp

If you need to have a function available to multiple cpp files, consider simply adding the declaration to the necessary cpp files. This wouldn’t fly for a final submission but we’re prototyping here.

Compile a single file

Out of habit, you might build the whole solution, or the specific project when working to see if you’ve introduced any compile errors. Instead, consider if you can afford to simply compile the single file you’re working in. You can typically afford this when making modifications local to a cpp file. This will allow you to avoid having to deal with the linker until you absolutely need to. It might even be worthwhile to make the shortcut convenient.

Use a smaller project

Is there a smaller project you can test in instead of the primary projects? If so, consider using that instead. Your iteration time will improve simply by the fact that your project has a lot less to handle.

In the same vein, consider If you can simply compile the project your files are in without compiling the larger projects that are likely to include a large number of files.

Disable optimizations locally

If you’re running with optimizations and optimizations are removing useful variables you’d like to investigate, consider disabling optimizations in your file instead of changing the project to a debug configuration. This will only require recompiling the single file instead of the whole project.

Be Nice To The Compiler And Yourself – Long Term Changes

At this point, we’re done iterating, we’ve solidified our solution and now we need to clean it up for submission. At this stage, you’re probably not compiling as frequently and a lot of your time is writing all the necessary bits to link it all together. Decisions made here will impact not only you, but all users of the project. We want our compile times to be short. Long compile times impact everyone, and that’s no fun. Here are some things to consider at this stage.

Does this change really need a new file?

If you’re considering creating a new file, consider if It’s really worth it. A new file implies a new header, a new header implies new includes, new includes means more dependencies, more dependencies means more compilation.

This is a balancing act. Adding a new cpp/h pair might actually allow you to reduce compile times by only including the relevant bits without needing to modify a highly used header. But it might also spawn into a monstrous chain of includes. It’s a matter of tradeoffs, consider them carefully.

New class != new file

Creating a new class does not imply that a new file should be created. Does your class really need to have a header? Is it used by any other class or is it more of a utility for a single file? If your class is only in use by a single cpp file, consider putting it in the cpp file. If it needs to become public, we can implement that change later. (Your coding standard might disagree here in some contexts, if it does, follow the standard)

Don’t add unnecessary includes to your headers

Headers are not your friend. Do everything you can to reduce the number of includes that you have to add to a header. The more headers required, the bigger the compilation chain grows. Consider if you can forward declare the required types, or if you’re simply including the header for convenience. A simple rule is to only include the header if you absolutely need it to compile. Avoid “general purpose” headers that include a large chunk of what you need. They’ll eat you alive (and your compile times).

Do you really need this function to be a private function?

When adding a private function to an object, consider if it has better use as a utility function embedded in the cpp file. Adding private methods to a header means that other files have to recompile if you modify functionality that they had no access to in the first place. It’s bad encapsulation. This is especially true for private static functions and private static variables.

Can you remove some includes from that header?

Are you working in a header and notice some unnecessary includes? See if you can remove them! Every little bit helps.

Standard headers are a good target to remove from inclusion. They tend to be heavily templated and can reduce compile times significantly.

If you can forward declare a type instead of including this file in your header, consider doing that.

Use the appropriate include type

Using a relative include “../../File.h” will reduce the load on the preprocessor if the path to the file is correct. Use this format when you reference a file from the same project. Use #include <Project/File.h> when the included file is part of a different project than our current one. Avoid #include “Project/File.h” or #include “../../File.h” if your file is not located at this specific location. This causes the preprocessor to search from the relative location of all the included files which can be quite a bit of work.

Does this need to be a template?

Be very wary of making a template that is included in your header. Templates suffer from a quantity of problems. Is your template needed by anyone else but your cpp? If not, consider embedding it. Does your type really need to be a template? Consider if you have an actual use case for this template. If you absolutely need this template. Consider de-templatifying the functionality that doesn’t require the generic type and putting that functionality either in a base class or a set of utility functions.

What’s problematic about templates? Templates require you to write their implementations in your header so that your compiler can instantiate it correctly. This means that any change to the internal functionality of your template will require a recompilation of all the files that include it. Templates can also be tremendously complex, causing a significant amount of work for your compiler. (Template metaprogramming anyone?) If you need your functionality to be expressed at compile time, consider constexpr, compilers are more efficient at processing these than processing template recursion. Also consider that templates require your compiler to create a new instance for every cpp that includes it. (If one cpp instantiates vector and another also instantiates vector the compiler has to do the work twice!) you might consider extern templates in this situation. But I would strongly recommend dropping the template altogether if you can.


There are some tools that can help you improve your compile times in a variety of ways. Here’s a few of them.


Ccache is a compilation cache tool that maintains a cache of compilations of your cpp files. This works accross full rebuilds, different project version (Maybe you have 2 similar branches) and even across users.


Compiler Time Tracking Tools

Compilers are starting to incorporate tools to track where time is being spent in compilation. Clang/GCC have “-ftime-report” and MSVC has build insights https://devblogs.microsoft.com/cppblog/introducing-c-build-insights/

An excellent blog post from Aras Pranckevičius details these more thoroughly https://aras-p.info/blog/2019/01/12/Investigating-compile-times-and-Clang-ftime-report/

Show Include List

MSVC has an option to show what’s included in a file. Consider using this to know what’s being included and what doesn’t need to be included.


Clang/GCC have the -M and -H options. https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Preprocessor-Options.html#Preprocessor-Options

Include What You Use

There is also an interesting tool that determines what includes can be removed and what could be forward declared.