Category Archives: Algorithm

2D Collision Detection Library – tinyc2

Collision detection is pretty hard. Making 2D games is pretty fun. Wouldn’t it be nice to make a cool 2D game without fiddly around forever with collisions? With the single-file C header library tinyc2 now it’s possible to focus on the fun parts of development rather than twiddling with endless collision bugs, or ripping collision detection out of pre-made physics engines.



Texture Atlases

Texture atlases for 2D games is a great optimization for batching together tons of different sprites (especially quads) with a very few number of draw calls. Making them is a pain. For pixel art or other 2D art, DXT compression is often not a great choice due to the lossyness. One good way to make atlases for 2D games is with the PNG format. Single-file loaders and savers can be used to load and save PNGs with a single function call each.

Here’s an example with the popular stb_image, along with stb_image_write:

This requires two different headers, one for loading and one for saving. Additionally the author of these headers also has a bin packing header, which could be used to make a texture atlas compiler.

However I have created a single-file header called tinydeflate. It does PNG loading, saving, and can create a texture atlas given an array of images. Internally it contains its own bin packing algorithm for sorting textures and creating atlases. Here’s an example:

The above example (if given a few more images) could output an atlas like so:


Followed by a very easy to parse text formatted file containing uv information for each atlas:

The nice thing about tinydeflate is the function to create atlases has a little bit of anti UV bleeding code inside of it. Most places on the internet will tell you to solve texture atlas bleeding by adding padding pixels around your textures. This might be important for 3D applications or when mip-mapping is important, however, I don’t have this kind of 3D experience. For 2D pixel art games it seems squeezing UVs ever so gently inside the borders of a texture works quite well, and so tinydeflate takes this approach. No complicated padding necessary.

And finally here’s an older video I did with some more information on writing the code yourself to do some of the above tasks.

Essentials of Software Engineering – With a Game Programming Focus

I’ve been commissioned to write a big document about what I think it means, and what is necessary, to become a good software engineer! If anyone finds the document interesting please do email me (email on my resume) with any questions or comments, even if you hate it and deeply disagree with me! This is the first release so some sections may be a bit skimpy and I might have some false info in there here and there.

I’m a fairly young engineer so please do take all my opinions with a healthy dose of skepticism. Nonetheless I hope the document is useful for someone! Many of my opinions come from my interpretation of much more experienced engineers I’ve come in direct contact with, and this PDF is sort of a creation of what others taught me.

As time goes on I may make edits or add new chapters, when I do I’ll modify the version number (just above the table of contents). Here is the PDF:

Download (PDF, Unknown)

Preprocessed Strings for Asset IDs

Mick West posted up on his site a really good overview of some different methods for hashing string ids and gave good motivation for optimizing this area early on in a project. Please do review his article as it’s a prerequisite to this post, and his article is just really good.

I’ve been primarily concerned with memory management of strings as they are extremely hairy to work with. For me specifically I’ve ruled out the option of a string class — they hide important details and it’s too easy to write poor (but functional) code with them. This is just my opinion. Based on that opinion I’d like to achieve these points:

  • Avoid all dynamic memory allocation, or de-allocation
  • Avoid complicated string data structures
  • Little to no run-time string traversals (like strcmp)
  • No annoying APIs that clutter the thought-space while writing/reading code
  • Allow assets to refer to one another while on-disk or in-memory without complicated pointer translations

There’s a lot I wanted to avoid here, and for good reason. If code makes use of dynamic memory, complicated data strctures, etc. that code is likely to suck in terms of both performance and maintenance. I’d like less features, less code, and some specific features to solve my specific problem: strings suck. The solution is to not use strings whenever possible, and when forced to use strings hide them under the rug. Following Jason Gregory’s example he outlined about Naughty Dog’s code base from his book “Game Engine Architecture 2nd Ed” I implemented the following solution, best shown via gif:


The SID (string id) is a macro that looks like (along with a typedef):

I’ve implemented a preprocessor in C that takes an input file, finds the SID macro instances, reads the string, hashes it and then inserts the hash along with the string stored as comment. Pretty much exactly what Mick talked about in his article.

Preprocessing files is fairly easy, though supporting this function might be pretty difficult, especially if there are a lot of source files that need to be pre-processed. Modifying build steps can be risky and sink a ton of time. This is another reason to hammer in this sort of optimization early on in a project in order to reap the benefits for a longer period of time, and not have to adjust heavy laid-in-stone systems after the fact.

Bundle Away the Woes

I’ve “stolen” Mitton’s program for use in my current project to recursively grab source files and create a single unified cpp file. This large cpp can then be fed to the compiler and compiled as a “unity build”. Since the bundle script looks for instances of the “#include” directive code can be written in almost the exact same manner as normal C++ development. Just make CPP files, include headers, and don’t worry about it.

The only real gotcha is if someone tries to do fancy inclusions by defining macros outside of files that affect the file inclusion. Since the bundle script is only looking for the #include directive (by the way it also comments out unnecessary code inclusions in the output bundled code) and isn’t running a full-blown C preprocessor, this can sometimes cause confusion.

It seems like a large relief on the linker and leaves me to thinking that C/C++ really ought to be used as single-translation unit languages, while leaving the linker mostly for hooking together separate libraries/code bases…

Compiling code can now look more or less like this:

First collect all source into a single CPP, then preprocess the hash macros, and finally send the rest off to the compiler. Compile times should shrink, and I’ve even caught wind that modern compilers have an easier time with certain optimizations when fed only a single file (rumors! I can’t confirm this myself, at least not for a while).

Sweep it Under the Rug

Once some sort of string id is implemented in-game strings themselves don’t really need to be used all too often from a programmer’s perspective. However for visualization, tools, and editors strings are essential.

One good option I’ve adopted is to place strings for these purposes into global table in designated debug memory. This table can then be turned off or compiled away whenever the product is released. The idea I’ve adopted is to allow tools and debug visualization to use strings fairly liberally, albeit they are stored inside the debug table. The game code itself, along with the assets, refer to identifiers in hash-form. This allows product code to perform tiny translations from fully hashed values to asset indices, which is much faster and easier to manage compared to strings.

This can even be taken a step further; if all tools and debug visualizations are turned off and all that remains is a bunch of integer hash IDs, assets can then be “locked” for release. All hashed values can be translated directly into asset IDs such that no run-time translation is ever needed. For me specifically I haven’t quite thought how to implement such a system, and decided this level of optimization does not really give me a significant benefit.

Parting Thoughts

There are a couple of downsides to doing this style of compile-time preprocessing:

  • Additional complexity in the build-step
  • Layer of code opacity via SID macro

Some benefits:

  • Huge optimization in terms of memory usage and CPU efficiency
  • Can run switch statement on SID strings
  • Uniquely identify assets in-code and on-disk without costly or complicated translation

If the costs can be mitigated through implementing some kind of code pre-processor/bundler early on then it’s possible to be left with just a bunch of benefits :)

Finally, I thought it was super cool how hashes like djb2 and FNV-1a use an initializer value to start the hashing, typically a carefully chosen prime. This allows to hash a prefix string, and then feed the result off to hash the suffix. Mick explains this in his article this idea of combining hashed values as a useful feature for supporting tools and assets. This can be implemented both at compile or run-time (though I haven’t quite thought of a need to do this at compile-time yet):


Good If Statements – Self Documenting Code

Here’s a quick trick for documenting those pesky if-statements, especially when littered with complicated mathematics.

Here’s an example from Timothy Master’s “Pragmatic Neural Network Recipes in C++” where the author needed to test some interval bounds to see if the conditions were appropriate to perform a parabolic estimate of a step-size. Sounds, and looks, complicated:

In my opinion this author writes extremely good code. In the above example the details of the code are very well documented, however that’s just due to wonderful (albeit dubiously placed) comments. When modified slightly the code can still maintain the same readability and perhaps gain a little clarity:

The trick is to just name those little integer values that collapse down to a 1 or 0 (the comparison operators in C, like && and ||, take their arguments and return a 1 or 0). One of the most common applications of this kind of naming is when there’s a complex end condition in an algorithm. Often times this kind of termination criterion can just be called “done”:


Circular Orbits | Planetary Motion | NBody Simulation

I took a few hours yesterday to start on a simulation for a job application. Part of the simulation involves trying to figure out how to create stable nbody orbits, at least stable for some amount of time to look interesting.

Turns out that the mathematics beyond constructing a circular orbit of just two bodies is a bit far-fetch’d. We can see in this link that solving for the velocity of each planet is very straightforward. Let me take a screenshot of the final equations here:


To me this makes a lot of sense and matches my own derivation, albeit with a slightly different final form (just due to some algebra differences). I plug in the equation into my simulation and hit run, however there’s just not enough velocity to keep an orbit. See for yourself (the simulation is slowed down a lot compared to later gifs):


Each white circle is a body (like a planet) and the red circle is the system’s barycenter.

So I tinkered a bit and found out the velocity is too dim by a factor of sqrt( 2 ). My best guess is that the equations I’m dealing with have some kind of approximation involved where one of the bodies is supposed to be infinitely large. I don’t quite have the time to look into this detail (since I have much more exciting things to finish in the simulation), so I just plugged in my sqrt( 2 ) factor and got this result:


Here’s the function I wrote down to solve for orbit velocity in the above gif:

I’m able to run the simulation for thousands of years without any destabilization.

If any readers have any clue as to why the velocity is off by a factor of rad 2 please let me know, I’d be very interested in learning what the hiccup was.

Next I’ll try to write some code for iteratively finding systems of more than 2 bodies that have a stable configuration for some duration of time. I plan to use a genetic algorithm, or maybe something based off of perturbation theory, though I think a genetic algorithm will yield better results. Currently the difficulty lies in measuring the system stability. For now the best metrics I’ve come up with is measuring movement of the barycenter and bounding the entire simulation inside of a large sphere. If the barycenter moves, or if bodies are too far from the barycenter I can deem such systems as unstable and try other configurations.

One particularly interesting aspect of the barycenter is if I create a group of stationary bodies and let them just attract each other through gravity, the barycenter seems to always be dead stationary! The moment I manually initialize a planet with some velocity the barycenter moves, but if they all start at rest the barycenter is 100% stable.


A Star

Recently I’ve found a nice piece of pseudo code for implementing A Star after searching through a few lesser or incorrect pseudo code passages:

And here is the Pop function pseudo code (written by me, so it probably has a small error here or there):

My favorite thing about the pseudo code is that the closed list can be implemented with just a flag. The open list becomes the most complicated part of the algorithm. Should the open list be a sorted array, an unsorted array, a binary heap? The answer largely depends on much more memory you need to traverse.

If a small portion of memory needs to be searched all dynamic memory can be allocated up-front on one shot. Otherwise bits of memory should probably be allocated up-front, and more as necessary during the algorithms run.

Just yesterday I implemented AStar in C where my full header file looked like:

In my internal C file I only have < 150 lines of code, including some file-scope variables and math functions. Implementation was nearly a one to one transcription of the above pseudo code (so those who are recursion impaired like myself shouldn’t have any problems). This style may not be thread-safe, but hey, most AI related coded can only be done in serial anyway. My grid size was maxed out at 20×15, so pre-allocating memory for your entire search area may not be so practical as it was for me.

Still, I hope this post can provide some bits of intuition that are useful to someone.

The Concept of Polymorphism

For a while I thought of many computer science terms as useless hype. For example, as said by Jon Blow in one of his recent videos, some terms can be vague or hard to recollect. Buzzwords.

Polymorphism, semantics, encapsulation, instantiate, interface, API, functional, abstraction, indirection, blah blah blah. Reading too many of these words make my eyes glaze over.

In Compilers, Principles, Techniques and Tools by Aho, Sethi and Ullman has a keen description of polymorphism — at least it really stuck with me. Here with me I have the 1988 reprint so it actually contains C code (unlike the new “shiny” version), and in it states:

“An ordinary procedure allows the statements in its body to be executed with arguments of fixed types; each time a polymorphic procedure is called, the statements in its body can be executed with arguments of different types. The term “polymorphic” can also be applied to any piece of code that can be executed with arguments of different types. so we can talk of polymorphic functions and operators as well.

Built-in operators for indexing arrays, applying functions, and manipulating pointers are usually polymorphic because they are not restricted to a particular kind of array, function, or pointer. For example, the C reference manual states about the pointer operator &: “If the type of the operand is ‘…’, the type of the result is ‘pointer to …’.” Since any type can be substituted for “…” the operator & in C is polymorphic.”

So, some of these buzzwords are just misrepresented.

It seems the idea of polymorphism is useful to express, in code, algorithms that manipulate data structures. If the data being transformed is a data structure, then the data within the structure can sometimes be irrelevant.

For example, perhaps we have a math library that runs matrix operations on 32-bit floats. The library is used extensively in a large project. One day some strange numeric bug creeps into a complex algorithm. It would be helpful to be able to swap all 32-bit floats (the data within the “structure”, where the structure is the project) to 64-bit floats to see if more precision affects the numerically-related bug. If polymorphism is supported, perhaps through function overloads or compiler-supported binary operators, this task might be trivial.

Another example is the need to compute the height of a tree.

Another example is to find a circular dependency in a dependency graph.

Freelist Concept

A freelist is a way of retrieving some kind of resource in an efficient manner. Usually a freelist is used when a memory allocation is needed, but searching for a free block should be fast. Freelists can be used inside of general purpose allocators, or embedded directly into an optimized algorithm.

Lets say we have an array of elements, where each element is 16 bytes of memory. Our array has 32 elements. The program that arrays resides in needs to request 16 byte elements, use them, and later give them back; we have allocation of elements and deallocation of elements.

The order the elements are allocated is not related in any way to order of deallocation. In order for deallocation to be a fast operation the 16 byte element needs to be in a state such that a future allocation be handed this element to be reused.

A singly linked list can be used to hold onto all unused and free elements. Since the elements are 16 bytes each this is more than enough memory to store a pointer, or integer index, which points to the next block in the free list. We can use the null pointer, or a -1 index to signify the end of the freelist.

Allocating and deallocating can now look like:

Setting up the memory* will take some work. Each element needs to be linked together somehow, like through a pointer or integer index. If no more elements are available then more arrays of size 32 can be allocated — this means our memory is being managed with the style of a “paged allocator”, where each array can be thought of as a page.

The freelist is an important concept that can be embedded ad-hoc into more complex algorithms. Often times it is important for little pieces of software to expose a very tiny C-like interface, usually just a function or two. Having these softwares self-contain their own internal freelists is one way to achieve a simple interface.

Example of Hiding the Freelist

For example say we are computing the convex hull of a point-set through the Quick Hull algorithm. The hypothetical algorithm exposes an interface like this:

This QHull function does no explicit memory allocation and forces the user to allocate an appropriate amount of memory to work with. The bounds of this memory (how big it needs to be for the algorithm’s worst case scenario) is calculated by the ComputeMemoryBound function.

Inside of QHull often times the hull is expanded and many new faces are allocated. These faces are held on a free list. Once new faces are made, old ones are deleted. These deleted faces are pushed onto the free list. This continues until the algorithm concludes, and the user does not need to know about the details of the embedded memory management of the freelist.

Convex hull about to expand to point P. The white faces will be deleted. The see-through faces will be allocated.

A convex hull fully expanded to point P. All old faces were deleted.

The above images were found at this address:

Parsing C Style Expressions

Turns out that constructing a hand-written C-style parser has a few parts that were very difficult for me.

The first thing was realizing that Backus Naur Form (BNF) largely sucks if you want to hand-write your own parser. BNF is really verbose and expressing simple things like optional terminals or lists is difficult. BNF is also poor for expressing operator precedence, as many intermediate and redundant non-terminals are required to be evaluated during parse-tree derivation. As an alternative Extended Backus Naur Form is perfect for languages that plan to use hand-written parsers instead of parsers created by parser generators. Left-factoring a BNF for LL parsing is also not very useful since handling infinite recursion with hand-written code is trivial.

The second thing is that parsing expressions with various types of operators can be really difficult, especially if there’s a lack of confidence in recursion (like myself). Creating a parse tree given a string representing an expression is a very recursive problem.

In C expressions consist of atoms and operators. An atom would be a literal, constant value, identifier, or an expression wrapped in parentheses. Operators are the usual + or – kind of tokens.

If each operator has an associated precedence value there are a few different algorithms out there with references for learning. I ended up face in the dirt and in the end derived what is known as “precedence climbing“. According to Eli Bendersky precedence climbing is what is currently used by Clang to parse C++ expressions! That ought to instill some perceived merit. From what I can tell Lua 5.3 uses (well, very close to) the same method.

The idea of precedence climbing is to think of two major recursive operations:

  • Compute righthand-side node
  • Make a binary operator node and connect lefthand-side and righthand-side children nodes

The first point is the complex one (that is, conceptually complex). The algorithm starts given a lefthand-side node, however, righthand-side nodes do not come in through the input stream in tree format; the next token can represent a node that should be much deeper in the tree — this means that computing the righthand-side node ought to be the main recursive path.

Realizing that the righthand-side node computation is the recursive path led me to notice a key observation that tipped me off to a working algorithm.

Say we have the following input string as an expression: A 2 B 1 C 4 D 3 E 7 F

Numbers are operators, and the number itself is precedence (higher number is higher precedence), letters are atoms (like a const int variable). Here’s the valid parse tree:


The lowest leaves are evaluated first. It’s easy to see that the tree itself encodes the operator precedence.

If we begin parsing our input string the first atom is A, which will always be a lefthand-side node for most any parsing algorithm used, and will likely be the left-most node in the tree. The next token is the 2 operator followed by B. It’s easy enough to construct the subtree of node 2 pointing to A and B.

The next input token is the operator 1 and atom C. C is bound by operator precedence to the operator 4, though the current state of the algorithm has yet to even read in the token 4. Studying this scenario is what tipped me off to a working solution; C must be treated as a lefthand-side node, though at the current state is considered a potential righthand-side node.

Wikipedia, and this link from earlier, both show great pseudo code for the precedence climbing algorithm. The main difference between the two links is wikipedia includes a nested for-loop in favor of less overall recursive calls. My own code ended up looking something like after I cleaned it up from influences of previous links:

In the end I’m quite happy with the result, and even hooked up a nice ascii-tree printer courtesy of a random stack-overflow user. Here’s a dot product and initialization trees in ascii:

My favorite part about the operator precedence climbing algorithm is how it handles parentheses and prefix unary operators: parentheses can be considered an atom, and when the atom function finds a parentheses is just calls the expression parsing function directly and returns the result! The same can be done for prefix unary operators (if they have really high precedence). The algorithm also trivially handles right-associativity. I haven’t yet thought about unary postfix operators, so if any reader has thoughts on this topic please do comment!

Here’s psuedo-y atom code: